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Preface 

These are the notes that have grown out of a introductory graduate course I have given for the 
past few years at the IfA. They are meant to be a ‘primer’ for students embarking on a Ph.D. in 
astronomy. The level is somewhat shallower than standard textbook courses, but quite a broad 
range of material is covered. The goal is to get the student to the point of being able to make 
meaningful order-of-magnitude calculations — and a number of problems are included - and to 
give the students a fairly uniform base in the relevant physics that they can use as a starting point 
and introduction to the more detailed textbooks they will need to use when they come to address 
serious problems. The books that I have drawn upon extensively in devising this course are Rybicki 
and Lightman; Shu’s two-volume series; Longair’s two-volume series; various sections of Landau and 
Lifshitz (Classical theory of fields; Mechanics and Fluid Mechanics in particular); Huang’s statistical 
mechanics, and Binney and Tremaine. Some sections here are rather terse overviews of the relevant 
parts of these texts, but there are some other areas which I felt were not adequately covered, where I 
have tried to give more elaborate coverage. The reader is strongly encouraged to consult these texts 
along with the present work and particularly to attempt the relevant problems contained in many 
of these. 

The book is organised in the following sections: 

• Preliminaries - We review aspects special relativity, Lagrangian and Hamiltonian dynamics, 
and the mathematics of random processes. 

• Radiation - The course follows quite closely the first few chapters of Rybicki and Lightman. 
We review the macroscopic properties of electromagnetic radiation; we briefly review the con¬ 
cepts of radiative transfer and then consider the properties of thermal radiation and show the 
relation connection between the Planck spectrum and Einstein’s discovery of stimulated emis¬ 
sion. The treatment of polarization in chapter is done somewhat differently and more attention 
is given to radiation propagation both in the geometric optics limit and via diffraction theory. 
This section concludes with a general discussion of radiation from moving charges, followed 
by specific chapters for the important radiation mechanisms of bremsstrahlung, synchrotron 
radiation and Compton scattering. 

• Field Theory Initially an informal introduction to the ‘matter’ section, this has now expanded 
to become a major part of the book. 

• Matter Starting with the reaction cross sections as computed from field theory we develop 
kinetic theory and the Boltzmann transport equation, which in turn forms the basis for fluid 
dynamics. The goal is to show how the approximate, macroscopic theory is based on funda¬ 
mental physics. We then consider ideal fluids; viscous fluids; fluid instabilities and supersonic 
flows and shocks. Also covered here is the propagation of electromagnetic waves in a plasma. 

• Gravity — We start with a brief review of Newtonian gravity and review properties of simple 
spherical model systems. We then consider collisionless dynamics, with particular emphasis 
on their use for determining masses of astronomical systems. 

• Cosmology We then consider cosmology, cosmological fluctuations and gravitational lensing 
(TBD). 

• Appendices — In an attempt to make the course self-contained I have included some basic 
results from vector calculus and Fourier transform theory. There is a brief and simple review 
of the Boltzmann formula, and appendices on dispersive waves, the relativistic covariance of 
electromagnetism, and complex analysis. 

There are still some major holes in the syllabus. Little attention is given to neutron stars and 
black holes, for instance, nor to accretion disk theory or MHD. These shortcoming reflect the interests 
of the author. 

I am continually finding errors in the text, and am grateful to the students who have suffered 
through the course who have pointed out many other errors. 
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Chapter 1 

Special Relativity 


A remarkable feature of Maxwell’s equations is that they support waves with a unique velocity c, 
yet there is no ‘underlying medium’ with respect to which this velocity is defined (in contrast to say 
sound waves in a physical medium). An equally remarkable observational fact is that the velocity of 
propagation of light is indeed independent of the frame of reference of the observer or of the source 
(.Michelson and Morley experiment). Searches for the expected aether drift proved unsuccessful. 
These results would seem to conflict with Galilean relativity in which there is a universal time, and 
universal Cartesian spatial coordinates such that each event can be assigned coordinates on which 
all observers can agree. Einstein’s special theory of relativity makes sense of these results. The 
result is a consistent framework in which events in space-time are assigned coordinates, but where 
the coordinates depend on the state of motion of the observer. The situation is rather analogous 
to that in planar geometry, where the coordinates of a point depend on the origin and rotation 
of ones chosen frame of reference. However, one can also use vector notation to express relations 
between lines and point — e.g. a + b = c — which are valid for all frames of reference. In special 
relativity the fundamental quantities are points, or ‘events’, which are vectors in a 4-dimensional 
space-time. We will see how these ‘4-vectors’ transform under changes in the observer’s frame of 
reference, and how particle velocities, momenta and other physical quantities can be expressed in 
the language of 4-vector notation. Indeed the fundamental principle of relativity is that all of the 
laws of physics can be expressed in a frame invariant manner. The last part of the chapter deals 
with the transformation properties of distribution functions (e.g. the density of particles in space, 
or the distribution of particles over energy, velocity etc). 


1.1 Time Dilation 

An immediate consequence of the frame-independence of the speed of light is that observers in 
relative motion with respect to one another must assign different time separations to events. 

Consider an observer A with a simple gedanken clock consisting of a photon bouncing between 
mirrors attached to the ends of a standard rod of length l 0 as illustrated in figure |1.1| One round 
trip of the photon takes an interval A£ 0 = 2 l 0 /c in the ‘rest-frame’ of the clock. Now consider the 
same round trip as seen from the point of view of an observer B moving with some relative velocity 
v in a direction perpendicular to the rod. 

First, note that A and B must assign the same length to the rod. To see this imagine B carries 
an identical rod, also perpendicular to his direction of motion, with pencils attached which make 
marks on A’s rod as they pass. Since the situation is completely symmetrical, the marks on A’s 
rod must have the same separation as the pencils on B’s. Thus transverse spatial dimensions are 
independent of the frame of state of the observer. 

From B’s point of view then the distance traveled by the photon in one round trip must exceed 
2Zo, and consequently the time interval between the photon’s departure and return is At > Afo- In 
this time, A’s rod has moved a distance vAt, so, by Pythagoras’ theorem the total distance traveled 
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Figure 1.1: Illustration of time-dilation. Left panel shows one click of the gedanken clock in A’s frame 
of reference. Right panel shows the path of the photon as seen by a moving observer. Evidently, if 
the clock is moving the photon has to travel futher, so the interval between clicks for a moving clock 
is greater that if the clock is at rest. The time dilation factor is 7 = Ij\/l — v 2 /c 2 . 


by the photon is 2 + (vAt/2) 2 = cAt and solving for At gives 

At = A t 0 / \/l — v 2 /c 2 


or, defining the Lorentz gamma-factor 

7 = — v 2 /c 2 


we have the time dilation formula 


t = 7 1 0 - 


( 1 . 1 ) 


( 1 . 2 ) 

(1.3) 


Thus moving clocks run slow by the factor 7 . This is a small correction for low velocities, but 
becomes very large as the velocity approaches c. This behavior is not paradoxical, since the situation 
is not symmetrical; the two events (departure and return) occur at the same point of space in A’s 
frame of reference, whereas they occur at different positions in B’s frame. The spatial separation of 
the two events in B’s frame is Ax = vAt, and the temporal separation is At = Ato/y/l — v 2 /c 2 and 
so we have 

At 2 - A x 2 /c 2 = At 2 . (1.4) 

The quantity 

At 0 = \J At 2 — Ax 2 /c 2 (1.5) 

is called the proper time interval between two events. It is invariant; ie it is the same for all observers 
in a state of constant relative motion, even though they assign different spatial and temporal intervals 
to the separation of the events. It is equal to the temporal separation of the events as measured 
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by an observer for whom the two events occur at the same point in space. Any other observer will 
assign a greater temporal separation. 


1.2 Length Contraction 

We can derive the Lorentz-Fitzgerald length contraction formula in a very similar fashion. Let us 
equip A with a pair of clocks mounted back to back so that a pair of photons repeatedly depart from 
A, travel equal and opposite distances Z 0 , bounce off mirrors, and then return to A. A space-time 
diagram of one cycle of the clock as perceived by A is shown on the left hand side of figure |1.2| 

The same set of events as perceived by an observer B now moving with constant velocity parallel 
to the arms of the clock is shown on the right for the case of v = c/2. Set the origin of coordinates at 
the emission point, and let the length of the arms in B’s rest frame be l. The two photons propagate 
along trajectories x = Let until they reach the mirrors, which have world lines x = ±Z + vt. Solving 
for the reflection times t± and locations gives 


ct ± = ±x± = - — —j~. (1.6) 

1 =F v/c 

The return flight of each photon is the same as the outward flight of the other photon, so the total 
time At elapsed between departure and return to A satisfies 


cAt = c(t + + t-) = l 


1 1 

1 — v/c 1 + v/c 
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1 — v 2 /c A 


= 2 7 2 


(1.7) 


However, we know that At = 7 Ato = 2 7 Zo/c and hence we obtain the Lorentz-Fitzgerald length 
contraction formula 

l = Iq/j. ( 1 . 8 ) 

This is sometimes stated as moving rods appear foreshortened. More precisely, we have shown 
that two events which occur at the same time in one frame and have separation l in that frame will 
have a spatial separation in a relatively moving frame of Iq = > l. Consider, for example the 

reflection events. In A’s frame these occur at the same time, so A to = 0, and have spatial separation 
Axo = 2Zo- In B’s frame however, they have temporal separation (times c) of 

cAt = c(t+ — t_) = 2 "/ 2 lv/c = 2 'yIqv/c (1.9) 


and spatial separation 


Ax = x + — X- = 2 7 2 / = 2jIq. 


Evidently 


Ax 2 — c 2 At 2 = 4Zq 7 2 (1 — v 2 /c 2 ) = (2 1 0 ) 2 = Ax q 


is also an invariant. The quantity 


Axo = \J Ax 2 — c 2 At 2 


( 1 . 10 ) 

( 1 . 11 ) 


( 1 . 12 ) 


is known as the proper distance between the two events. 

Finally, the area of the region enclosed by the photon world lines is, in A’s frame, Aq 
whereas in B’s frame 

o ;2 

A = (V2x + )(V2x-) = — - 0 = A 0 . 

7^(1 — v z /c A ) 


(■ V2lo ) 2 
(1.13) 


so this area is an invariant. Since transverse dimensions are invariant, this means that the space-time 
4-volume is an invariant. 
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Figure 1.2: Space-time diagram used to derive length contraction. On the left are shown the photon 
trajectories (wiggly diagonal lines) departing from the source S (whose world-line is the central 
vertical line) and reflecting from mirrors M_, M + (also with vertical world-lines) and returning to 
the source S. This is for a stationary clock, and the interval between clicks of the clock — the time 
between departure and return of the photons — is cAt = 2lo . The slanted lines on the right show 
the world lines for the source/receiver and mirrors for a clock which is moving at a constant velocity 
(v = c/2 in this case). Since the speed of light is invariant, the photons still move along 45-degree 
diagonal lines. Now we have already seen that the interval between clicks for a moving clock is 
larger than that for a stationary clock by a factor 7 . This means that the time between emission 
and return for the moving clock is cAt = 27 Z 0 . It is then a matter of simple geometry (see text) to 
show that the distance between the source and the mirrors l is smaller than that for the stationary 
clock by a factor 7, or l = Iq/' y. This is the phenomenon of relativistic length contraction; if we have 
a metre rod moving in a direction parallel to its length then at a given time the distance between 
the ends of the rod is less than 1 metre by a factor I/7. The above description is of two different 
clocks, viewed in a single coordinate system. There is a different, and illuminating, alternative way 
to view the above figure. We can think of these two pictures as being of the same clock and 
indeed the very same set of emission, reflection and reception events - but as viewed from two 
different frames of reference. The left hand picture shows the events as recorded by an observer who 
sees the clock as stationary while the right hand picture is the events as recorded by an observer 
moving at velocity v = c/2 with respect to the clock. Now consider the reflection events, labelled 
and r+. In the clock-frame these events have spatial separation Ax = 21q, while in the moving 
frame simple geometrical analysis shows that the spatial separation is Ax = 2 jIq. Now we are 
saying that the separation of the mirrors is larger for the moving clock, whereas before we were 
saying the moving clock’s rods were contracted. This sounds contradictory, or paradoxical, but it 
isn’t really. The resolution of the apparent paradox is that the situation is again non-symmetrical 
between the two frames. The reflection events occur at the same time in the clock’s rest frame, and 
the separation is the so-called ‘proper-separation’ Axo = 21q. In the moving frame the two events 
have a time coordinate difference At 7 ^ 0, and the spatial separation, as we show in the text, is now 
Ax = sj Axq + c 2 At 2 . In the earlier discussion we were computing the distance between the two 
events e, e! which occur same time in the moving frame. These events in the rest frame do not occur 
at the same coordinate time. 
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Figure 1.3: The Lorentz transformation causes a shearing in the x — t space. This was shown above 
for an area bounded by null curves, but the result is true for arbitrary areas. 


1.3 Lorentz Transformation 


Figure 1.2 shows that the effect of a boost on the area in the x — ct plane bounded by the photon 
paths is to squash it along one diagonal direction and to stretch it along the other. The same is true 
for any area, as illustrated in figure |1.3| In fact, one can write the transformation for the spatial 
coordinates as multiplication by a matrix, whose coefficients are a function of the boost velocity. 

In this section it will prove convenient to work in units such that c = 1 (or equivalently let t' = ct 
and drop the prime), so photon world lines are diagonals in x — t space. 

Let’s now determine the form of this transformation matrix for the case of a boost along the 
:r-axis. For such a boost, we know that the y and 2 -cooordinates are unaffected, so we need only 
compute how the x and t coordinates are changed. Consider first what happens if we take the x—t 
plane and rotate it by 45 degrees. Specifically, let’s define new coordinates 


T 

X 


R{ 45°) 


cos(45°) 
sin(45°) 


— sin(45°) 
cos(45°) 



t 

1 

' 1 

-1 ' 


t 


X 


1 

1 


X 


(1.14) 


Now we saw in the previous section that in this rotated frame the effect of a boost is just a stretch in 
the horizontal direction with scale factor S+ = x+/Iq — 1/7(1 — v) and a contraction in the vertical 
direction with scale factor £_ = x_/l$ = 1 / 7(1 + v). The effect of the boost on position vectors in 
this 45-degree rotated system is just multiplication by the 2x2 matrix 


M = 


S+ 0 
0 S- 


(1.15) 


or, denoting coordinates in the boosted frame by syperscript, 


V ' 


' S+ o' 


' T ' 


' S+T ' 

X' 


1 

0 

Co 

1 


X 


S-X _ 


(1.16) 


The effect of these linear transformations is illustrated in figure |1.4| 

So far we have obtained the linear transformation matrix for transforming the rotated X — T 
coordinates. What we really want is the matrix that transforms un-rotated x — t coordinates. This 
is readily found since we have 


t' 

x' 


= R- 


V 

X' 


= R~ X M 


T 

X 


= R~ l MR 


t 

x 


(1.17) 


with R = 17(45°) the rotation matrix for a 45 degree rotation. Evidently, the transformation from 
x — t to boosted x' — t' coordinates is effected by multiplying by the matrix M' = R~ X MR, or 


1 

1 

1 ' 


' S+ O' 


' 1 

-1 ' 

1 

S+ + S- 

-S+ + S- ' 

2 

-1 

1 


0 S- 


1 

1 

~ 2 

-S+ + s _ 

S+ + S- 


(1.18) 














































24 


CHAPTER 1. SPECIAL RELATIVITY 



t T V 


Figure 1.4: A region of 2-dimensional x — t space-time bounded by photon world lines is shown in 
the left hand panel. The center panel shows the same region in X — T coordinates, which are just 
x — t coordinates rotated through 45°. The right panel shows the same region after applying a boost 
along the axaxis. 


but S+ + S- = 2 7 and S+ — ST = 2y/3 where /3 = v/c. Therefore the transformation of x — t 
coordinate vectors induced by a boost of dimensionless velocity /3 is 



7 —7/3 t _ "f(t — f3x) 

— 7/3 7 x "f(x — /3t) 


(1.19) 


Finally, recalling that transverse dimensions y, z are unaffected by a boost in the ^-direction we 
obtain the full transformation as a 4 x 4 matrix multiplication 




lit - Px) 
l(x - /3 1 ) 
y 

Z 


( 1 . 20 ) 


This is known as the Lorentz transformation. 


1.4 Four-vectors 

The prototype 4-vector is the separation between two space-time events 



which transforms under a boost v/c = (3 as 

at"* - A V 

where the 4x4 transformation matrix A^V is 

7 -7/3 

- 7/3 7 

1 

1 



( 1 . 21 ) 


( 1 . 22 ) 


(1.23) 


Summation of repeated indices is implied, and such summations should generally involve one sub¬ 
script index and one superscript index. 
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Alternative notation for four vectors is 


= x = (ct , x) = (ct , x l ) 


(1.24) 


where i = 1,2,3. 

We have seen that the Lorentz transformation matrix A corresponds to a diagonal shearing in the 
x — t subspace. The determinant of A is unity, in accord with the invariance of space-time volume 
previously noted. 

This is for a boost along the rr-axis. The transformation law for a boost in another direction can 
be found by multiplying matrices for a spatial rotation and a boost. There are also generalizations 
of (1.20) which allow for reflections of the coordinates (including time). See any standard text for 
details. 

The vector x M is a contravariant vector. It is also convenient to define a covariant 4-vector which 
is equivalent, but is defined as 

—ct 


x 

y 


(1.25) 


with a subscript index to distinguish it. 

The two forms of 4-vector can be transformed into each other by multiplying by a 4 x 4 matrix 
called the Minkowski metric 


= if" = 


-1 


(1.26) 


since clearly aA = rj^x v and x M = r)^ v x v . 

There is a version of tb 
which is related to A M „ by 


There is a version of the Lorentz matrix A,," that transforms covariant 4-vectors as x' = A ,Yx v 

fj, [A, [A 


A " = 

The norm of the 4-vector x is defined as 


V 


-A T 


(1.27) 


s 2 = x ■ x = x^x^ = —c 2 t 2 + x 2 + y 2 + z 2 = x • x — c 2 t 2 (1-28) 

which we recognize as the invariant proper separation of the events. It can be computed as s 2 = 
rj^x^x 11 etc. If the norm is positive, negative or zero the separation is said to be ‘space-like’, ‘time¬ 
like’ and ‘null’ respectively. Since the norm is invariant, a separation which is space-like in one frame 
will be space-like in all inertial frames etc. 

A 4-component entity A is a four-vector if it transforms in the same way as x under boosts (as 
well as spatial transformations, rotations etc). 

The scalar product of two four vectors A, B is defined as 

A • B = A^B^ = —A°B° + A B (1.29) 

and it is easy to show that the scalar product is invariant under Lorentz transformations. 

The gradient operator in space-time is a covariant vector since we require that the difference 
between the values of some scalar quantity / at two neighboring points 

df = dx ■ V/ = f (1.30) 

should transform as a scalar (ie be invariant). We often write the gradient operator as V = = 

d/dx^. We will also use the notation d^y = y yfl to denote partial derivatives with respect to space 
time coordinates. 

A 4 x 4 matrix T^ is a contravariant rank-2 tensor if its components transform in the same 
manner as A^B V . Covariant rank-2 tensors T or mixed rank-2 tensors are defined similarly, 
as are higher rank tensors. 
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• Example rank-2 tensors are r and 8^ v (the Kronecker 5-symbol). These are both constant; 
the components have the same numerical value on all inertial frames. Other examples are the 
outer product of a pair of vectors A^B”, and the gradient of a vector field d^A v . 

• Tensors can be added, so A^ v + is a tensor. 

• Higher order tensors can be obtained by taking outer products of tensors, such as T MI ' cr ' r = 
A^B Ta . 

• Indices can be raised and lowered with the Minkowski metric. 

• Pairs of identical indices can be summed over to construct tensors, vectors of lower rank 
by contraction. For example, on can make a vector by contracting a (mixed) rank-3 tensor 
A M = T^ v . It is important that one contract on one ‘upstairs’ and one ‘downstairs’ index. If 
necessary, one should raise or lower an index with the Minkwski metric. 

• The fundamental principle of special relativity is that all of the laws of physics can be expressed 
in terms of 4-vectors and tensors in an invariant manner. 


1.5 The 4-velocity 


The coordinates of a particle are a 4-vector, as is the difference of the coordinates at two points 
along its world-line. For two neighboring points or ‘events’, we can divide by the proper-time dr 
between the events, ie the interval between the events as measured by an observer moving with the 
particle and which is a scalar, to obtain the 4-velocity 



(1.31) 


which is a contravariant 4-vector. 

If the particle has 3-velocity u relative to our inertial frame, then the two events in our frame 
have temporal coordinate separation dt = 7 u dr, with y u = l/^/l — u • u/c 2 as usual, and hence 
the particle’s 4-velocity is related to its coordinate velocity by U° = dx°/dr = cdt/dr = cy u and 
U l = dx l /dr = y u dx' l /dt or 


U = 7u 


c 

u 


(1.32) 


If we undergo a boost along the x-axis of velocity (3 = v/c into some new inertial frame then the 
components of the particle’s 4-velocity transform as 


U'° = 7 (U° - (3U 1 ) 
U' 1 = 7 (U 1 - 0U°) 
U' 2 = U 2 
U' 3 = U 3 


(1.33) 


These relations can be used to show how speeds and velocities of part icles transform under 
boosts of the observer’s frame of reference as follows: The first of equations (1.331 with U° = 7 u c, 
U 1 = juU 1 etc and with cc-component of the coordinate velocity in the unprimed frame u 1 = u cos 9 
gives 


7u' = 77 


■-( 


1 UV Q 
1 — - 3 ? COS 0 

C z 


(1.34) 


which allows one to transform the particles Lorentz factor 7 U , and therefore also the particle’s speed 
|u|, under changes in inertial fram e. 

The second of equations (1.331 gives q u ’u' 1 = 7(7 U ^ 1 — (3c q u ) = 77 u (m 1 — v) or 


u' l = 


u — V 
1 — vu 1 /c 2 


(1.35) 
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which is the transformation law for the coordinate velocity. 

If the particle is moving along the x-axis at the speed of light in the unprimed frame (u 1 = c) 
then then the velocity in the unprimed frame is u' 1 = (c — v)/(l — v/c) = c. This is in accord with 
the constancy of the speed of light in all frames. 

Finally, in the rest frame of the particle, U = (c, 0), so dotting some vector with a particle’s 
4-velocity is a useful way to extract the time component of the 4-vector as seen in the particle’s 
frame of reference. 


1.6 The 4-acceleration 


The four-acceleration is 



(1.36) 


and is another 4-vector. 

The scalar product of the 4-acceleration and the 4-velocity is A - U = d(JJ ■ U)/dr which vanishes 
because the squared length of a 4-vector isU-U = —c 2 , which is invariant. Thus the four acceleration 
is always orthogonal to the 4-velocity. 

In terms of the coordinate 3-velocity, the 4-acceleration is 


A = ^{d(p/c) / dt , d(yu) /dt). 


(1.37) 


and a little algebra gives the norm of the 4-acceleration in terms of the particle’s coordinate accel¬ 
eration and coordinate velocity as 


A ■ A = y 4 (a • a + y 2 (u • a/c) 2 ). 


(1.38) 


In the particle’s rest-frame U = (c, 0, 0,0) so A 0 = 0, a nd therefore the norm is just equal to the 
square of the proper acceleration: A ■ A = |ao | 2 so (1.381 gives the acceleration felt by a particle in 
terms of the coordinate acceleration in the observer’s frame of reference. 

If we decompose the 3-acceleration into components aj^ and ay which are perpendicular and 
parallel to the velocity vector u it is easy to show that 


A - A = |a 0 1 2 = 7 4 (°j_ + 7 2 °n) 


(1.39) 


Of particular interest is the case a = aj_, as is the case for a particle being accelerated by a static 
magnetic field. In that case, the rest-frame acceleration is larger than in the ‘lab-frame’ by a factor 
7 2 . This is easily understood. Observers in different inertial frames agree on the values of transverse 
distances as these are not affected by the Lorentz boost matrix. The second time derivative of the 
transverse position of the particle is larger in the instantaneous rest-frame than in the lab-frame, 
simply because time runs faster, by a factor 7 , in the rest-frame. This will prove useful when we 
want to calculate relativistic synchrotron radiation. 


1.7 The 4-momentum 

Multiplying the 4-velocity of a particle by its rest mass m (another invariant) gives the four- 
momentum 

P = mU = 7 ? 7 i(c, u). (1-40) 

The spatial components of the 4-momentum differ from the non-relativistic form by the factor 7 . 
To see why this is necessary consider the situation illustrated in figure |1.5| 

Some texts use the notation too for the rest-mass and set to = 7TOo- The space components of 
the relativistic 4-momentum are then P = mu, just as in non-relativistic mechanics. We do not 
follow that convention. 
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Figure 1.5: Illustration of the relativistic form for the momentum. Two observers A and B pass each 
other on rapidly moving carriages and as they do so they bounce balls off each other, exchanging 
momentum. The upper panel shows the symmetric situation in the center of mass frame. The lower 
panel shows the situation from B’s point of view. Now B assigns a longer time interval to the pair 
of events A and A’ than to B, B’ while transverse distances are invariant so it follows that the 
transverse velocity he assigns to A’s ball is lower than his own by a factor 7 . Thus, in B’s frame 
mu x is not conserved, but 7 mu x is conserved in the collision. 


The time component of the 4-momentum is 

2 TYIC n I 2 r -t a -t \ 

cP u = 7 me = — . = me H —mv + ... (1.41) 

\J\ — v 2 /c 2 2 v ' 

which, aside from the constant me 2 coincides with the kinetic energy for low velocities, and we call 
cP° = E the total energy. The 4-momentum is 

P={E/c, P). (1.42) 

The 4-momentum for a massive particle is a time-like vector and its invariant squared length is 

E 2 /c 2 - P P = m 2 c 2 . (1.43) 

Massive particles are said to ‘live on the mass-shell’ in 4-momentum space. 

All these quantities and relations are well-defined in the limit m —> 0. For massless particles 
E 2 = | P | 2 c 2 , and with E = hu), P = T;.k the 4-momentum is then 

P = h(uj/c, k). (1-44) 

The total 4-momentum for a composite system is the sum of the 4-momenta for the component 
parts, and all 4 components are conserved. Note that the mass of a composite system is not the 
sum of the masses of the components, since the total mass contains, in addition to the rest mass, 
any energy associated with internal motions etc. 


1.8. DOPPLER EFFECT 
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source frame 


observer frame 



Figure 1.6: Left panel shows a photon emitted from a source as seen in the rest-frame. Right panel 
shows the situation in the observer frame in which the source has velocity v = uk. 



1.9 Relativistic Beaming 

Consider a source which emits radiation isotropically in its rest frame. What is the angular distri¬ 
bution of radiation in some other inertial frame? 

Let a particular photon have source-frame 4-momentum 

1 

cos 9 

sin 9 cos <t> 
sin 8 sin 4> 

and let the source have velocity v = vx. in the observer frame. The observer therefore has velocity 
v = — vx in the source-frame, and so the photon 4-momentum in the observer frame (primed frame) 



(1.48) 
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1 


7(1 + /3cos 0) 

E' 

cos O' 

E 

7 (cos 0 + j. 3) 

c 

sin O' cos </>' 

c 

sin 0 cos 4> 


sin O' sin cf)' 


sin 0 sin </> 


Comparing the ratio P y / P z in the two frames shows that <j)' = (f>; the azimuthal angle is the same in 
both frames. Comparing P y jP x reveals that 


tan O' 


sin 0 

7 (cos 0 + /3)' 


(1.50) 


For large velocities /l -» c so 7 > 1 and this results in the photon trajectories in the observer frame 
being confined to a narrow cone of width 0 ~ I /7 around the direction of motion of the source. For 
example, consider ‘equatorial rays’ in the source-frame for which 0 = 90°. In the observer frame 
these have 

tan O' = —- => O' ~ — (1-51) 

7/3 7 

so the width of the beam is on the order of I /7 for 7 1. This result will be useful when we 

consider synchrotron radiation. 

It is also interesting to consider the energy flux in the beam. The Doppler formula says that the 
energy of the photons are boosted by a factor 


hv' _ 1 

hv 7(1 — /3 cos O') 


(1.52) 


Now cos 0' ~ (1 — 0' 2 /2 + ...) and /3 = (1 — 7 2 ) 1 / 2 ~ (1 — 7 2 /2 + ...)so 


7(1 - /3cos O') ~ 7(1 - (1 - 7” 2 /2)(l - O' 2 / 2) ~ ^( 7 ~ 2 + O' 2 ) ~ l/ 7 , (1.53) 


where we have used O’ ~ I /7 for 7 > 1. The typical energy boost factor is therefore hv'/hv ~ 7 . 
These photons are compressed by a factor ~ y 2 in angular width so the energy per unit area is 
increased by a factor ~ y 3 . What about the rate at which this energy flows? Consider a finite 
wave train of N waves. This will be emitted in time At = N/v in the rest frame, but will pass our 
observer in time At' = N/v’ ~ At/ 7 , with the net result that the energy flux (ie the energy per unit 
area per unit time) is increased by a factor ~ 7 . 

We will consider the transformation of radiation intensity more rigorously below. 


1.10 Relativistic Decays 


As an example of the use of 4-momentum conservation, consider a massive particle of mass M which 
spontaneously decays into two lighter decay products of mass m\ and m 2 with energies (as measured 
in the rest-frame of the initial particle) E\ and E 2 (see figure 1.71. We shall set c = 1 for clarity in 
this section. 

Conservation of energy and momentum gives 


M — Ei T E 2 

0 = Pi + P 2 


(1.54) 


where we are using units such that c = 1. The latter tells is that |Pi | 2 = | P2 1 2 , but | P 1 1 2 = Ef — m 2 
and so 4-momentum conservation can also be written as 


AI — E\ T E 2 

Ei — = E% — TO2 


(1.55) 
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Figure 1.7: Decay of a heavy particle of mass M into two lighter decay products in i, m 2 - Four 
momenta in the rest frame of the decaying particle are indicated. 


and solving this pair of equations for the two unknowns E \, E 2 yields 

M 2 + to? - ml M 2 - m\ + 

El = - 2 M - E2 = - 2 M - (L56) 

which are the energies of the decay products in the center-of-momentum frame. 

Now consider the inverse process where two energetic particles collide and merge to form a heavier 
particle. The sum of the particle energies in the center of momentum frame then sets a threshold; 
this is the maximum mass particle that can be created. If we fire two equal mass particles at each 
other with energy Ei = E 2 then the available energy is M = Pi +p 2 - If, on the other hand, we fire a 
particle of mass Pi at a stationary target of mass m 2 , then the total energy of the resulting particle is 
E = Ei + m .2 and the total momentum of the product is P = Pi, or equivalently P 2 = P 2 = Ef — ml 
and so the mass of the product is 

M 2 = E 2 - P 2 = (E 1 + m 2 ) 2 -El + ml = m\ + m\ + 2m 2 Pi . (1.57) 

In the highly relativistic case where Pi m ±, m 2 the mass threshold is M ~ y/Zm^Ei which is much 
less than the mass threshold if one were to collide two particles of energy Pi in a head-on collision. 
This is because in the stationary target case most of the energy is carried off in the momentum of 
the resulting particle, and the mass threshold is reduced by a factor ~l/y^y. This explains why the 
highest energy collisions are obtained in particle accelerators which collide counter-rotating beams 
of particles and anti-particles. 


1.11 Invariant Volumes and Densities 

Boosts of the observer induce changes in the spatio-temporal coordinates of events and thereby 
modify the 3-volume of a box, for instance. Boosts also cause changes in the energies and momenta 
of particles, and therefore modify the 3-volume of momentum space occupied by some set of particles, 
and therefore cause the momentum-space density of particles to vary etc. 

There are however certain combinations of volumes, densities etc that remain invariant under 
Lorentz boosts, and it is highly desirable to write the laws of physics in a manner which makes use 
of these invariants as much as possible. 

1.11.1 Space-Time Volume Element 

One example of an invariant volume we have already seen is the space-time volume element: 

dV dt = dx dy dz dt is Lorentz invariant. 


(1.58) 
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1.11.2 Momentum-Space Volume Element 

Now consider momentum-space. Consider a set of particles which are nearly at rest, but actually 
have a small range of 3-momenta 0 < p x < A p x etc. Consider the difference in 4-momenta of the 
particles at the origin and at the maximum of the range of momenta. To first order in A pi this is 
A P = (0, Ap x , Ap y , Ap z ) since the range in energy is quadratic in the momenta for low momenta 
so we can ignore it. In a boosted frame, this 4-momentum difference is 


■ APq - 


7 -7/3 


0 


—y[3Ap x 

AP' 


-7/3 7 


A P x 


7 A P x 

A P' y 


1 


APy 


APy 

AP' 

L 2 J 


1 


A P z 


A P z 


(1.59) 


and therefore 

AP' x AP' y AP' z = yAP x AP y AP z (1.60) 

but the energy in the rest-frame is E = m while in the lab-frame E' = ym so 7 = E'/E and hence 


d 3 p 

If 


is Lorentz invariant. 


(1.61) 


This says that a boost will change both the energy of a bunch of particles and also the 3-momentum 
volume that they occupy, but the combination above is invariant under boosts. This is very useful 
in formulating the relativistic Boltzmann equation. 


1.11.3 Momentum-Space Density 


We define the momentum space density or momentum distribution function n( p) so that n(p)d 3 p 
is the number of particles in 3-momentum volume element d 3 p. Like any number, this is Lorentz 
invariant, so 


n(p)d 3 p = -EVi(p) 


d 3 p 

If 


is Lorentz invariant 


(1.62) 


and therefore 


En( p) is Lorentz invariant. 


(1.63) 


1.11.4 Spatial Volume and Density 

Consider a set of particles all moving at the same velocity relative to some inertial reference frame, 


as illustrated in figure 1.8 This figure shows that if a certain number of particles occupy a certain 


region in the lab-frame then they occupy a region in the rest-frame which is larger by a factor 7 . This 
means that spatial volumes in the rest frame d 3 r$ and in the lab-frame are related by d 3 ro = yd 3 r, 
but E 0 = m and E = ym, so 7 = E/E 0 and therefore 


Ed 3 r is Lorentz invariant. 


(1.64) 


Since the number of particles in some volume element n(y)d 3 r is clearly Lorentz invariant, this 
means that the spatial density transforms such that 

7l(v) 

- is Lorentz invariant. (1.65) 

E 

This has an interesting consequence. Consider a neutral plasma consisting of streams of electrons 
and positrons propagating at equal velocities but in opposite directions. The two streams have equal 
densities by symmetry. Now consider the situation as perceived by an observer moving in the same 
direction as the electrons. That observer sees the positrons to have a higher energy, and therefore 
a higher space density, than the electrons. For that observer the plasma is not neutral but has a 
positive charge density. 
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(Ax, At = 0) 


(Ax 0 , At 0 ^ 0) 




Figure 1.8: Illustration of the transformation of the spatial volume occupied by a set of particles. 
Left panel shows world-lines for a set of particles all with the same non-zero momentum in the 
‘laboratory frame’. The horizontal line spans the range of ^-coordinate Ax which contains a certain 
number of particles (four here) and this interval has At = 0. Right panel shows the same thing 
from the rest-frame of the particles. The particle world-lines are now vertical, and the transformed 
interval Ax is now tilted and has Afo 0. Since Ax 2 — At 2 is invariant, this means that Axo > Ax. 

1.11.5 Phase-Space Density 

We define the phase-space density or phase-space distribution function /(r, p) such that /(r, p)d 3 r d 3 p 
is the number of particles in 6-volume d 3 r d 3 p. Since Ed 3 r and d 3 p/E are Lorentz invariant, then 
so is d 3 r d 3 p and since /(r, p)d 3 r d 3 p is a number of particles this means that 

/(r.p) is Lorentz invariant. (1.66) 


1.11.6 Specific Intensity 

We can use the foregoing to compute how the energy density of radiation and the specific intensity 
transform under a boost of one’s frame of reference. 

The spatial energy density for particles occupying a momentum-space volume d 3 p is given by the 
product of the spatial number density and the energy 

d 3 u = /(r, p )Ed 3 p = /(r, p )Ep 2 dpdfl. (1-67) 

For photons, or any zero rest mass particle, p = hv /c and E = hv, so dp = (h/c)dv so 

h 4 

d 3 u = u v (D)dvdLL = —/(r. p)v 3 dvdCl (1.68) 

c d 

and, since /(r, p) is Lorentz invariant, dividing through by dvdfl shows that the specific energy 
density transforms such that 

Uv (^ 1 ) 


is Lorentz invariant 


(1.69) 
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and therefore that the specific intensity also transforms in such a way that 

^ is Lorentz invariant. 


(1.70) 


This is an extremely useful result. If one can compute the change in energy of a photon v —> v' 
induced by a boost of the observer frame then we obtain the transformation of the intensity 

(1.71) 

If we integrate over frequencies we find that the bolometric intensity transforms as 

/'(O') = 7(Q) (1.72) 


where v+ is some characteristic frequency (e.g. the median frequency, the energy weighted mean 
frequency, the frequency of peak intensity or any other fiducial point on the spectrum). For a source 
emitting black-body radiation, this means that any observer also sees black-body radiation, though 
with temperature scaled according to the Doppler frequency shift. 


1.12 Emission from Relativistic Particles 


A useful procedure for computing the total power radiated by an accelerated relativistic particle is 
to go into the instantaneous rest frame of the particle (primed frame) and compute the power using 
Larmor’s formula and then transform back to the (unprimed) observer frame. 

The latter transformation is trivial, since provided the radiation emission is front-back symmet¬ 
ric (as is the case for dipole radiation and for most emission mechanisms covered here) the net 
momentum of the emitted radiation vanishes, so if an amount of energy dW' is radiated then the 
net 4-momentum of the radiation is P' = (dW' /c, 0, 0, 0) and therefore the energy in the observer 
frame is just dW = 7 dW'. Similarly, if this energy is emitted in a time dt' in the rest frame, this 
corresponds to an interval dt = 7 /dt 1 in the observer frame and therefore 


P = 


dW 

dt 


= P’ = 


dW’ 

dt' 


(1.73) 


so the radiated power is Lorentz invariant. 
Larmor’s formula gives 


P' = 


V 

3 c 3 


J |2 


(1.74) 


but, as discussed, the time component of the 4-acceleration vanishes in the rest-frame, and so 
|a '| 2 = A ■ A and 


P = 


3c 3 A A 


(1.75) 


which is manifestly covariant. 

Using (1.391 this can also be written 


r, 2 g 2 4/2 1 2 2 \ 

P= 3^ 7 ( °- L + 7 a,|) 


(1.76) 


which gives the power radiated in terms of the 3-acceleration measured in the observer frame. 
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1.13 Problems 

1.13.1 Speed and Velocity Transformation 

Consider a particle moving with coordinate velocity u in your frame (unprimed). 

1. What is the gamma-factor y u that you assign to the particle? 

2. Write down the 4-velocity U for this particle in your frame of reference. 

3. Apply a boost to obtain the 4-velocity U' for an observer (your friend) moving at velocity 
v = (u, 0 , 0 ) relative to you. 

4. Obtain an expression for the gamma-factor 7 U < for the particle in your friend’s frame of reference 
in terms of 7 U u, v, c, cos 9 = v • u and c. 

5. Obtain an expression for u' 1 , the ^-component of the particle’s coordinate velocity in your 
friend’s frame in terms of v, u 1 and c. 

6 . Apply this to the case of a particle with u = (c, 0,0). What is u'? 

1.13.2 Four-Acceleration 

1. Write down the 4-velocity U for a particle with coordinate 3-velocity u in terms of u, c and 
7 = 1 / \/l — u 2 /c 2 . 

2. The 4-velocity is A = dU/dr where r is proper time along the particles world line. Rewrite 
this in terms of the rate of change of the 4-velocity with respect to coordinate time t. 

3. Obtain an expression for the 4-acceleration in terms of 7 , (3 = u/c , aj^ and ay (these being 
the components of the coordinate acceleration a = du/dt in the directions perpendicular and 
parallel to u). The factor (3 should appear only once. 

4. Use the above to obtain an expression for the norm A - A in terms of 7 , aj_ and ay. 

5. What is the time component of the 4-acceleration in the rest-frame of the particle? 

6 . How is the squared coordinate acceleration |a 2 | in the rest-frame related to the invariant A - A. 

7. Use the above to obtain the rest-frame |a | 2 for a particle which in our frame has coordinate 

acceleration a = . 

1.13.3 Geometry of Minkowski space 

Consider an explosion which occurs at the origin of Minkowski space coordinates t = r = 0 which 

results in a cloud of test particles flying radially outward at all velocities v < c. 

a. Show that surfaces of constant proper time t as measured by the test particles is the hyper¬ 
boloid 

t 2 = T 2 + r 2 (1.77) 

where r 2 = x 2 + y 2 + z 2 . Sketch the intersection of this surface with the plane y — z = 0 and 
also show some representative test particle trajectories. 

b. Construct the spatial metric (line element) on this curved hypersurface as follows: 

1. Set up polar coords r, 0, p such that ( x , y, z) = r sin 9 cos ip, r sin 9 sin ip, r cos 9 

2. For a tangential line element (r = constant) dt = 0. Hence show (or argue) that the 
proper length of a tangential line element is 


(dl t ) 2 = r 2 ((d9) 2 + sin 2 9(dp) 2 ) = r 2 da 2 


(1.78) 
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3. For a radially directed line segment connecting points with Minkowski radial coordinate 
r, r + dr (but at the same r) there is a non-zero dt given by d(t 2 ) = d(r 2 + r 2 ) = d(r 2 ). 
Hence show that the proper length is 

(< dir) 2 = (dr) 2 - (dt) 2 = 1 { + dr J /T2 (1.79) 

4. Combine 2,3 to give 

( dl ) 2 = , 2 + r 2 dcr 2 (1.80) 

1 + r z /T- 

c. Now consider a rescaling of the radial coordinate R = r/r. 

1. Show that R = 'yv, where 7 = (1 — v 2 )~ ld2 and is therefore constant label for each particle; 
a ‘comoving radial coordinate’. 

2. Rewrite the result of b.4. as 

dl 2 = t2 ( i + R 2 + R2 dcr 2 ) (1.81) 

d. Show that (flat) Minkowski space, when written in r, i?, 6 , coordinates takes the form 

ds 2 = -dr 2 + t 2 ( ^ + R 2 da 2 ) (1.82) 

1 + -ft 

Rewrite this in terms of an alternative comoving radial coordinate u> where R = sinhw. Com¬ 
pare your results with formulae for the space-time geometry of an open Friedmann-Robertson- 
Walker cosmology from any standard introductory cosmology text. 


1.13.4 Relativistic decays 

Consider a heavy particle H of mass M which decays into two light particles of equal mass m. 


a. Show that in the rest frame of the heavy particle the energy of a decay product is E = M/2 
and the modulus of the momentum is |p| = (3 p M /2 where (3 P = (1 — Am 2 /M 2 ) 1 / 2 , so the 
4-momentum is 

r 1 1 


p = 


M 

Y 


/ 3 p yj 1 — [i 2 COS (fi 

fd P \/1 - M 2 siny> 


(1.83) 


b. Compute the decay product 4-momentum in a frame (the ‘laboratory frame’) in which the 
decaying particle has velocity (3 parallel to the x-axis. 

c. Show that for decays which are isotropic in the rest frame of the decaying particle the decay 
product energy is uniformly distributed in the range E~ < E p < E+ where 

E±= E ^-± T> **-( l- 4 m 2 /M 2 ) 1/2 (1.84) 

Sketch the mimimum and maximum decay product energy as a function of the energy of the 
decaying particle Eh- 

cl. Now consider decays from a distribution of heavy particles which all have the same energy in 
the lab frame but have isotropically distributed momenta. What is the distribution in energy 
of the decay products? What is the form of the phase-space distribution function f(p) for the 
decay products? (Assume that the occupation numbers for the final states are negligible). 



Chapter 2 

Dynamics 


This chapter consists of a review of some useful results from Lagrangian and Hamiltonian dynamics. 
We first introduce the concepts of generalized coordinates, the Lagrangian and the action. We 
state the principle of least action and then give some examples which show how to construct the 
Lagrangian to obtain the equations of motion. We show how energy and momentum conservation 
arise from symmetries of the Lagrangian under shift of time and spatial translations, and we also 
show how the Lagrangian formalism is useful for generating the equations of motion in transformed 
coordinates. We then review Hamilton’s equations and finally we discuss adiabatic invariance. 


2.1 Lagrangian Dynamics 

2.1.1 Generalized Coordinates 

In Lagrangian dynamics a mechanical system is described by generalized coordinates which we 
denote by qi with i running from 1 through N with N the number of degrees of freedom. We will 
also use the notation q = q t . 

The values of the coordinates at some instant of time are generally not enough to specify the 
state of the system, to fully specify the state one needs to give also the values of the velocities ip . 
The future evolution is then determined. 

2.1.2 The Lagrangian and the Action 

A mechanical system is defined by its Lagrangian. This is a scalar function of the coordinates and 
velocities and optionally time and is denoted by L{q i ,q i ,t). It has units of energy. 

The action S is defined for a bounded path q(f) in coordinate space and is the time integral of 
the Lagrangian 

^2 

S = j dt L(q(t),q(t),t). (2.1) 

tl 

The action has units of angular momentum. 

2.1.3 The Principle of Least Action 

The principle of least action asserts that the actual evolutionary histories (or more briefly ‘paths’) 
q (t) that a system follows are those which minimize (or more generally extremize) the action: 


*2 

SS = 6 J dt L(q(t), q(f), t) = 0. 


( 2 . 2 ) 
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q 



tl 


h t 


Figure 2.1: The lines show two hypothetical evolutionary histories of a one-dimensional system. The 
two paths begin and end at the same points. 


2.1.4 The Euler-Lagrange Equations 


The ‘global’ minimization \2.2\ implies a certain condition on the derivatives of the Lagrangian 
which must be locally satisfied. This condition provides the equations of motion for the system. 

Consider a system with one degree of freedom: L(q, q 1 1) and consider two hypothetical neigh¬ 
boring paths q(t) and q'(t) = q(t) + Sq(t) as illustrated in figure 2.1 The variation of q(t) implies a 
corresponding variation of the velocity Sq = d(Sq)/dt = Sq. 

The variation of the action is 


SS = S' — S = / dt L(q + 6q,q + Sq,t) — / dt L(q 1 q,t). 


(2.3) 


If we make a Taylor expansion and ignore terms higher than linear in the (assumed infinitesimal) 
perturbation to the path, we have 


*2 

f 

' dL ■ dL 

/ dt 

Sq A-Sq — 

dq dq 


(2.4) 


Now the second term in brackets here can be written as 

• dL d (. dL\ . d fdL 
Sq dq ~ dt \ 6q dq ) 6q dt\dx 


and therefore the action variation can be written as 

*2 


SS = 


' dL] t2 

. 9q\ t , 


dt Sq 


djk_d_L 

dt dq 


(2.5) 


( 2 . 6 ) 


but the first term vanishes since Sq(ti) = Sq(t 2 ) = 0 and, since the variation Sq(t) is arbitrary, the 
term within the square brackets in the integral must vanish. This gives the Euler-Lagrange equation: 


d % dL 
dt dq 


= 0 . 


(2.7) 
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The generalization to multi-dimensional systems is straightforward and we obtain 


d W, dL 
dt dqi 


( 2 . 8 ) 


which gives a set of equations, one per degree of freedom i. As we will now see, these equations 
provide the means to evolve the state of the system; they provide the equations of motion for the 
system. 


2.1.5 Example Lagrangians 


So far we have not said how the Lagrangian for a system is determined; we have simply asserted 
that there is such a function whose minimization provides the equations of motion. 

Consider the Lagrangian L = to|x| 2 /2, where x is the usual Cartesian spatial coordinate. Note 
that this is the only function which is at most quadratic in the velocity and satisfies homogeneity 
(which means L cannot depend on x) and isotropy ( L is independent of the direction of motion). 
The Euler-Lagrange equations are then 

d<2L 

d J e% =0 => Xi = constant (2.9) 


which is the law of inertia; we have obtained the equations of motion for a free particle, for which 
the Lagrangian is just the kinetic energy. 

We also see an example of what is a general rule: if the Lagrangian is independent of one of the 
coordinates then the corresponding velocity is constant. 

A less trivial example is a set of particles labelled by an index a and with (time independent) 
Lagrangian 

L(x 0 ,x a ) = ^2 2 m al i a| 2 - c/(xi,x 2 ,...) (2.10) 

a 

for which the Euler-Lagrange equations are 


dk a dU 
m ~dt = 


( 2 . 11 ) 


which we identify with Newton’s law for a system with potential energy [/(x a ). 

Note that the Lagrangian here is the kinetic energy minus the potential energy: L = T — U. 
Again, if L is independent of one of the coordinates dU/ditb = 0, then the corresponding velocity 
Xh is constant. 


2.2 Conservation Laws 


2.2.1 Energy Conservation 

If there is no explicit time dependence of the Lagrangian, so L — L(q, q), the total derivative of the 
Lagrangian is 

dL \ - dL . x - dL .. 


dt ' dq 

which, using the Euler-Lagrange equation, is 


i j ,1 dL 

dL \ ^ a dqi ■ , ulJ ■■ 


dL.. 


dt 


dqi 


dq i 

dt 


(2.13) 


so we have 




dqi 


(2.14) 
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which means that the quantity 

r)T 

Es 5>«;- £ < 2 - 15 > 

i 

which we call the energy is a constant of the motion. 

The energy is conserved for any system with dL/dt = 0. Such systems are said to be ‘closed’ or 
‘isolated’. 

The partial derivative of the Lagrangian dL/dqi with respect to the coordinate velocity q.i is 
called the momentum or, more verbosely, the momentum canonically conjugate to the coordinate 
qi . In terms of the momentum, the energy is 


E = ^2 qiPi - L. 


Note that for the system (2.10) the momenta are p a = ?n a x a and the energy is 


E = ^2 m a x 2 a /2 + C/(x!, x 2 ,...) 


(2.16) 


(2.17) 


which is the sum of the kinetic and potential energies as expected. 


2.2.2 Momentum Conservation 


Consider a system in which the potential energy depends only on the relative values of the Cartesian 
coordinates: ie the Lagrangian is invariant if we translate the entire system by a distance Ax. The 
change in the Lagrangian is 


SL = 






(2.18) 


but dL/dx. a = —d(dL/dk a )/dt from the E-L equations, so 


d 

dt 




or equivalently that the quantity 


p = Ep» 

a 


^2 dL/d± a 


(2.19) 


( 2 . 20 ) 


is conserved. This is called the total momentum of the system and its conservation law follows 
directly from spatial homogeneity. 

Similar arguments can be used to show that the angular momentum is conserved if the Lagrangian 
is independent of orientation. 

These quantities are the total momenta of the system. In addition to these conservation laws we 
also have conservation of individual momenta 


dL 

Pi = —— = constant 

oq l 


( 2 . 21 ) 


if the Lagrangian is independent of the corresponding generalized coordinate: dL/dqi = 0. 


2.3 Coordinate Transformations 

The Euler-Lagrange equations are useful for generating the equations of motion in arbitrary coordi¬ 
nate systems. Since L is a scalar quantity it is independent of the representation of the coordinate 
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system. For example, consider a free particle, for which the Lagrangian is L = m|x| 2 /2. However, 
consider what happens if one works in an expanding coordinate system and defines 


= a(t) r 


( 2 . 22 ) 


with a(t) a scale-factor. 

Expressing the Lagrangian in terms of the new r coordinates we have x = ar + ar which means 
that the Lagrangian L = mx 2 /2 becomes a function of both r and r : 


L{ r, r) = ™ ( a 2 r 2 + 2aar • r + a 2 r 2 ) 

To form the Euler-Lagrange equations we need the partial derivatives 

dL/dr = m(aar + a 2 r) 
dL/dr = m(a 2 v + aar) 

and the Euler-Lagrange equations are then 

„a . a 

r + 2 -r -\ — r = 0. 
a a 


(2.23) 


(2.24) 


(2.25) 


2.4 Hamilton’s Equations 

In Hamilton’s formulation of dynamics we define the Hamiltonian 

H = 'ZiP-LiqM. 


(2.26) 


This is identical in value to the energy defined in (2.15). 

At first sight, the Hamiltonian would seem to be a function of q,q,p, t. However, if we write 
down the differential 

dL dL dL 

dH = qdp + pdq -— dq — dq — dt (2.27) 

dq dq ot 

we see that the 2nd and 3rd terms cancel each other (recalling the definition p = dL/dq) and so dH 
contains only terms with dp, dq and dt so evidently the Hamiltonian is only a function of q, p and 
t: 

H=H(q,p,t). (2.28) 


Using the Euler-Lagrange equation to replace dL/dq in the fourth term in (2.271 with p and 
using the definition of the Hamiltonian (2.26) to replace dL/dt in the fifth term with — dH/dt we 
can write dH as 

dH 

dH = qdp — pdq H— 7 —dt 
dt 


but we can also write this as 


1TT dH dH , dH 

dH = TT dp + TT dq + TbT 
dp dq dt 


dt 


(2.29) 

(2.30) 


and comparing the coefficients of dp and dq yields Hamilton’s equations: 

q = dH/ dp 
p = -dH/dq 

The generalization to a multi-dimensional system is straightforward and we then have 

q.i = dH/dpi 
Pi = -dH/dqi 


(2.31) 


(2.32) 


For a system with N degrees of freedom, the Euler-Lagrange equations provide a set of N second 
order differential equations. Hamilton’s equations, in contrast, are a set of 27V coupled first order 
differential equations. Either set of equations can be integrated to obtain the evolution of the system. 
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Figure 2.2: Adiabatic invariance concerns systems where the Hamiltonian evolves slowly with time. 
The curves show contours of the Hamiltonian for such a system at three different times. At each 
time the system will orbit around a certain contour. Provided the system evolves slowly, the final 
contour is fully determined by the initial contour, and is independent of the details of how the system 
changed. The adiabatic invariant turns out to be the area within the contour. 

2.5 Adiabatic Invariance 

Hamilton’s equations are convenient for discussing adiabatic invariance. 

Consider a system with Hamiltonian H(p,q,t). For constant t , the energy is conserved, and the 
system will move around a contour of the Hamiltonian; ie around some closed loop in q,p space (see 
figure [ 2 T 2 ] ) . Now ask what happens if there is some slow explicit time dependence of the Hamiltonian 
(this might be some slow variation of a spring constant or perhaps some external contribution to 
the potential energy). If the change in the Hamiltonian is sufficiently slow the system will appear 
to conserve energy over short time-scales (successive orbits will be very similar) but over long time- 
scales there will be secular evolution and the system will evolve through a series of quasi-periodic 
orbits and the energy will also change with time. 

At time t\ the Hamiltonian is Hi(q,p) = H{q,p 1 t\) and the possible orbits form a 1-parameter 
family; these are just the contours of H\. Similarly at time £2 we have H- 2 {q,p) and we have another 
set of orbits. For a given initial energy E\, and for sufficiently slow variation of the potential, the 
system will end up in a specific energy orbit E 2 , independent of the details of how the variation 
took place. Equivalently, we can say that something is conserved in the evolution. It turns out 
that that quantity — which we call an adiabatic invariant — is just the area within the orbit in 
position-momentum space: 



is conserved. 

The derivation of this powerful and simple result is rather tedious (see L+L Mechanics for the 
details). The flavor of this result, can however be appreciated with a simple example. Consider a 
simple system which is a conker on a string of length l and rotating at some initial velocity v. What 
happens if we slowly shorten the string? Shortening the string does not exert any torque on the 
conker, so the angular momentum must be conserved and therefore Iv =constant and so v oc 1/Z; as 
the string shortens the velocity of the particle increases. The energy of the conker is not conserved; 
since the string is in tension, the agent shortening the string must do work. The energy of the 
system is entirely kinetic and increases as E oc v 2 oc l/l 2 . The angular frequency of the system is 
lo = v/l which is also proportional to v 2 and so we find that the energy evolves in proportion to the 
frequency: 

E oc u>{t). (2.34) 


Let us derive the equivalent scaling law for a system consisting of a simple harmonic oscillator 
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with a time varying spring constant using (2.331. The Hamiltonian is 

H = ^ p 2 /m + ^ muo 2 {t)q 2 


(2.35) 


For co ^constant the solution is an elliptical orbit in q, p space with semi-major axes po = \[2mE 
and go = \/2E/mco 2 so the area of the ellipse is 

I oc poqo oc E/co (2.36) 


so, if I is conserved we find E(t) oc co(t) as for the conker, provided the frequency changes slowly 
(the requirement is that the fractional change in the frequency over one orbit should be small). 

Another way to get to this result, again for a simple harmonic oscillator with time varying 
frequency, is to integrate the equations of motion: 

x + co 2 (t)x = 0. (2.37) 

For co =constant the solution is x = ae lu;t , so let’s look for a solution with slowly varying amplitude 
x = a(t)e lLUt . Performing the differentiation with respect to time yields 

x + co 2 {t)x = 0 = i(aco + 2 aco)e xut + ae lut (2.38) 


but if a(t) is slowly varying we can neglect a as compared to the other terms, and we find that the 
amplitude must obey the equation 

cloo 2(100 = 0 (2.39) 

which has solution a(t) oc w(f) -1 / 2 . The kinetic energy of the system is E oc x 2 oc (coa) 2 oc u; so 
again we obtain the scaling law E(t) oc co(t). 

Adiabatic invariance is closely connected to Liouville’s theorem. The latter says that particles 
in 6-dimensional phase space behave like an incompressible fluid. If we populate the orbits inside 
some given energy contour of Hi(p, q), it makes sense then that the area that these particles end up 
occupying at the end should be unchanged. 

Lastly, let us think about the quantum mechanics of this system. The energy levels for a constant 
frequency oscillator are 

E= (n+l/2)Tuo (2.40) 


so the classical adiabatic invariant behavior corresponds to conservation of n, which seems reasonable. 

For the simple case of SHM with time varying frequency the above approaches yield the same 
answer, for more complicated systems the most general and powerful approach is to use (2.33). 


2.6 Problems 


2.6.1 Extremal paths 

Extremal paths. Consider photon propagating through a medium with inhomogeneous refractive 
index n(x). Show that the time of flight is 


t = 


1 

c 


dl 


dxi dxi 

hi H 


1/2 


l(Xi) 


(2.41) 


where x(Z) is the path of the photon and where l is an arbitrary parameterisation along the path. 
According to Fermat’s principle the variation of the time of flight St vanishes for the actual ray. 
Show that the Euler-Lagrange equations for this variation problem are ‘Snell’s law of refraction’ 


dnk 

hi 


= Vn 


(2.42) 


where k is the photon direction and where the parameterisation has been chosen so that \d'x/dl\ = 1. 
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Use Snell’s law to estimate the angular deflection of a ray passing through a region of size L with 
some refractive index fluctuation Sn. 

If one observes a source at distance D through an inhomogeneous medium with random refractive 
index fluctuations Sn with a coherence scale L, how large does Sn need to be to cause multipath 
propagation? 

2.6.2 Schwarzschild Trajectories 

In general relativity the metric tensor g^ is defined such that the proper time interval corresponding 
to coordinate separation dx is (dr) 2 = —g^ u (x)dx tl dx''. Massive particles move along world lines 
that minimize the proper time: 

¥¥-<*■ <“> 

where A is some arbitrary parameterisation of the path. 

a) Use this variational principle to show that if g^ v {x) is independent of one of the coordinates 
x a this implies that 

U a = g a0 U^ = constant (2.44) 

where U = dx/dr is the 4-velocity. 

b) The Schwartzschild metric for a mass to is (in units such that c = G = 1) 

-(dr) 2 = - ^1 - ^ (dt) 2 + ^1 - ^ (dr) 2 + r 2 ((dd) 2 +sin 2 0(d<(>) 2 ). (2.45) 

Use the result from part a) to obtain the ‘energy equation’ for particles on radial orbits (9, 4> constant) 
as 

(dr/dr) 2 = ... (2-46) 

Note: You don’t need to know any GR to answer this question! 


2.6.3 Lagrangian electrodynamics 

From the stationarity of the action 

S = [ dt L(qi, (2.47) 

Jti 


for a system with generalised coordinates qi, derive the Euler-Lagrange equations 

d(dLdqi) 


dt 


= dL/dqi 


(2.48) 


The Lagrangian for a particle of mass to and charge q moving in an electromagnetic field is given 

by 

L(x, x) = -tox 2 + - A ■ x — q<j> (2.49) 

2 c 

where A(x, t) and <(>(x,f) are the vector and scalar potentials. 

Show that the momentum conjugate to x is 

p = tox + -A (2.50) 

c 

and that Euler-Lagrange equation generates the Lorentz force law 

tox = g[E H—x x B] (2-51) 


where B = V x A and E = — V(/> — (1 /c)dA/dt. (You may make use of the identity v x (V x A) = 
V(v ■ A) — (v • V)A). 



Chapter 3 

Random Fields 


Random fields and random, or stochastic, processes are ubiquitous in astronomy. The radiation 
field entering our instruments, the distribution of stars and galaxies, the distribution of wave-like 
disturbances in the early universe, in spiral galaxies or in the sun, the distribution of electrons in a 
CCD image; all of these are random processes. Here we will introduce statistics which are useful for 
describing such processes and some useful mathematical tools. 


3.1 Descriptive Statistics 

Let’s consider, for concreteness, a random scalar function of position /(r), though we could equally 
well have chosen a vector function like the electric field, and it might be a function of space and time, 
or just of time, or of 6-dimensional phase-space etc. Let us also consider statistically homogeneous 
processes, which fluctuate, but in which the statistical character of the fluctuations does not vary 
with location. 

The most general description would be some kind of probability density functional P(f(r))D[f(r)] 
giving the probability to observe a particular configuration of the field /(r). This probability can 
be thought of as giving the distribution over an ensemble of realizations, or alternatively one might 
think of it as giving the distribution over samples drawn from a single infinite realization at randomly 
chosen locations. 


3.1.1 iV-Point Distribution Functions 

A useful description is provided by the hierarchy of N-point distribution functions. The 1-point 
distribution function is 

P(f)df (3.1) 

and gives the probability to observe a field value / at some randomly chosen point in space. The 
2-point distribution function is 

P(fij2)dfidf 2 (3.2) 

and gives the probability to observe /(rq) = /i and /(r 2 ) = f 2 at two positions rq, r 2 . For statisti¬ 
cally homogeneous processes this will depend only on the separation rq — r 2 , and for a statistically 
isotropic process it only depends on the modulus of the separation |rq — r 2 |. 

One can readily generalize this to arbitrary numbers of points, and the whole hierarchy constitutes 
a full description of the random process. The utility of this approach it that useful physics can 
often be extracted from a reduced approximate description in terms of a few low order distribution 
functions. 
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3.1.2 iV-Point Correlation Functions 

We obtain the N-point correlation functions by integrating over the distribution functions. For 
example, the two-point correlation function is 

£(ri 2 ) = (fif 2 ) = I dhj df 2 hh p(/i,/ 2 ) (3.3) 

which again depends only on the separation of the pair of measurement points. 

This can be generalized to give the three-point correlation function and so on. 

The iV-point correlation functions are moments of the corresponding distribution functions. 


3.2 Two-point Correlation Function 


The two point correlation function (or auto-correlation function) is very useful as it allows one to 
compute the variance of any linear function of the random field. For example, the variance of the 
field itself is (/ 2 ) and is equal to the value of the auto-correlation function at zero lag: £(0). 

For a less trivial example, consider the average of the field over some averaging cell: 


7-U 


d 3 r 


/(r) 


(3.4) 


with V the volume of the cell. This is also a linear function of /. The variance, or mean square 
value, of / is obtained by writing down two copies of the integral (3.41, with the second having 
integration variable r', and enclosing it within the (....) ensemble averaging operator: 


</ 2 > = 


vi 


d 3 r 


/(r) 


d 3 r' f{r') 


(3.5) 


Rearranging the terms in this double integral and realising that the (....) operator only acts on the 
stochastic variables /(r), /(r') gives 


(/> = ± J^r ^dV(/(r)/(r')> = ^ j d 3 r j d 3 r' £(r - r'). (3.6) 

So the variance of the average field may be computed as a double integral over the 2-point function. 

It is sometimes the case that the range of correlations of the field is limited, so £(r) is appreciable 
only within some coherence length r c and is negligible for r r c . If the size of the averaging cell is 
large compared to the coherence length r c -C F 1 / 3 , then 


J ^ £(r' —r)~ J 


' £(C 


(3.7) 


where the range of integration is unrestricted, provided the point r lies at least a distance r c from 
the walls of the cell, and the variance is then 


(/ 2 > &) ~ mrl/V = ( f 2 )r 3 JV 


(3.8) 


which says that the rms value of the averaged field / will be approximately the rms value of the 
field / divided by y/~N, with N = V/r 3 the number of coherence volumes within the cell. 


3.3 Power Spectrum 

The 2-point function £(r) and it’s generalizations to iV-points are real-space statistics. The transla¬ 
tional invariance of statistically homogeneous fields suggests that Fourier-space or spectral statistics 
may also be useful. 
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The Fourier transform of the field / is 

/(k) = [ d 3 r /(r)e ik r . 


(3.9) 


and the power spectrum is proportional to the expectation value of the squared modulus of /(k) 

P(k)cx(|/(k)| 2 >. (3.10) 

The tricky thing here is getting the constant of proportionality since we are dealing with a field of 
infinite extent. For a random field occupying some large but finite volume Parseval’s theorem tells us 
that (27r) -3 / d 3 k |/(k)| 2 = f d 3 r / 2 (r) and the latter integral, and therefore also |/(k)| 2 , increase 
in proportion to the volume, so to get a sensible measure of the power we need somehow to divide 
(|/(k)| 2 ) by some suitably infinite volume factor. 

To make the definition of the power precise, consider the expectation of the product of the 
transform at two different spatial frequencies k, k', or more specifically, the ave rage of the the 
product /(k)/*(k') with /* the complex conjugate of /. Writing out two copies of (3.91, wrapping 
them in the averaging operator (...), and re-arranging terms yields 


(/(k)/*(k')} = d 3 r d 3 r' </(r)/(r'))e 


ik 7 r 


(3.11) 


Now (/(r)/(r')) = £(r — r') so on changing the second integration variable from r' to z = r' — r the 
double integral separates into a product of integrals and we have 


or 


(/(k)/*(k')> = J d 3 r e*( k—k ) r J d 3 ^(z)e tz l 


(/(k)/*(k')) = (27r) 3 5(k — k')P(k) 


(3.12) 


(3.13) 


where we have recognized the first integral in (3.121 as a representation of the Dirac (5-function and 

(3.14) 


where we have now defined the power spectrum as 

P(k) = [ d 3 r £(r)e lk r 


Equation (3.13) tells us that different Fourier modes (ie k' ^ k) are completely uncorrelated. 


This is a direct consequence of the assumed translational invariance, or statistical homogeneity, of 
the field. On the other hand, for k = k' the infinite volume factor in (/(k)/*(k)) = (|/(k)| 2 ) is 
supplied by the Dirac 5-function. 

We have only computed (/(k)/*(k')) here. Other correlation coefficients such as (/(k)/(k')) can 
be obtained using the fact that /(k) = /*(—k) which follows from the assumed reality of /(r). 

It is interesting to contrast the character of the fields in real-space and in Fourier-space. In real 
space, /( r) will generally have extended correlations, (/(r)/(r')) ^ 0 for r ^ r', but is statistically 
homogeneous. In Fourier space /(k) there are no extended correlations — the field /(k) is completely 
incoherent — but the field is inhomogeneous since, for example, (|/(k)| 2 ) varies with position k. 

Properties of the power spectrum: 

• The power spectrum and auto-correlation function are Fourier transform pairs of one another. 

This is the Wiener-Khinchin theorem. 

• The power spectrum tells us how the variance of the field is distributed over spatial frequency. 


Taking the inverse transform of (3.141 at r = 0 gives 


(/2> = « o) = /( 0 p(t) 


(3.15) 


so the total variance of the field can be obtained by integrating the power spectrum. 
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Figure 3.1: Illustration of the relationship between the auto-correlation function £(r) and the power 
spectrum P(k) for the important case where the former has a bell-shaped form with constant asymp¬ 
tote as r —> 0 and with width r c . 


• If the field is statistically isotropic then the power spectrum depends only on the modulus of 
the wave vector: P(k) = P(k). 


If /(r) is dimensionless then so is £(r) and therefore from (3.141 P(k) has units of volume. 


This is for fields in three spatial dimensions. Similarly, the power spectrum of some temporal 
process f(t) has units of time etc. 


• One sometimes sees the power expressed as A 2 {k) = k 3 P(k)/2 tt 2 , in terms of which the field 
variance is (/ 2 ) = / dink A 2 (k), so A 2 (fc) gives the contribution to the field variance per 
log-interval of wave number, and we have A 2 (k) — £(r ~ 1 /k). 


• If the field /(r) is incoherent, by which we mean that the values of the field at different points 
are uncorrelated, so £(r) oc S(r), then the power spectrum is constant. Such fields are referred 
to as white-noise. 


• If the auto-correlation function is a bell-shaped function with coherence length r c , then from 
the power spectrum will be flat for k -C 1 /r c , since then e lk r ~ 1 where £ is non- 
negligible. The value of the power at these low frequencies is then P(k -C 1 /r c ) ~ f d 3 r £(r) ~ 
£(0)r(|. For k 3> l/r c the power will be small since we then have many oscillations of e* k r 
within r ~ r c which will tend to cancel. These results are illustrated in figure |3.1| 



3.4 Measuring the Power Spectrum 

It is illuminating to consider estimating the power spectrum from a finite sample of the infinite 
random field. We can write such a sample as f s ( r) = W(r)/(r) where the function W(r) describes 
the sample volume geometry. For concreteness, imagine W to be unity within a cubical sample 
volume of side L and W = 0 otherwise. 

The Fourier transform of the finite sample is, from the convolution theorem, the convolution of 
the transforms / and W: 

r d 3 k' ~ 

fs( k) = J ^/(k')W(k-k'). (3.16) 

This says that the transform of the sample is a somewhat smoothed version of the intrinsically 
incoherent /(k). The width of the smoothing function W is A k ~ 1 /L, so the transform / s (k) 
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f( r ) 


P(k) 



Figure 3.2: The left panel shows a realization of 2-dimensional random ransom field /(r) ‘windowed’ 
by the function W (r) which is a disk of radius R. The random field is Gaussian white noise filtered by 
smoothing with a Gaussian kernel with scale length <r <C R. The right-hand panel shows the power 
spectrum. The ‘speckly’ nature of the power spectrum is readily apparent. The overall extent of the 
power spectrum is fc max ~ 1 /a while the size of the individual speckles is on the order Afc ~ 1 /R. 
The speckly nature of the power spectrum is not specific to the choise of Gaussian random fields. As 
one makes the window function — or survey size — larger, the size of the speckles decreases. The 
fractional precision with which one can measure the power averaged over some region of fc-space is 
A P/P ~ l/y/N where N is the number of speckles. Estimating the statistical uncertainty in power 
spectra is essentially a counting exercise. 


will be coherent over scales 5k -C 1 /L but will be incoherent on larger scales. The act of sampling 
introduces finite range correlations in the transform. 

If we square the sample transform and take the expectation value we find 


(l/,(k)| 2 > = 


d 3 k' f d 3 k" 


( 27 t ) 3 J ( 27 t ) 3 


W( k - k')W*(k - k")(/(k , )/*(k")) 


and using (3.131 allows one to evaluate one of the integrals to give 

r d 3 k' 


(l/s(k)r) = 


(2tt) : 


:\W(k-k')\ 2 P(k') 


(3.17) 


(3.18) 


ie a convolution of the power spectrum with |IT(k)| 2 . 

For naturally occurring random fields it is often the case that the power spectrum P(k) is a 
smoothly varying function, with fractional change in the power at two points k, k' = k + dk being 
small, provided that 5k -C k. For spatial frequencies k 1/L then P(k) will be effectively constant 
over the range of frequencies |k — k'| 1/L that |VF(k — k')| 2 is non negligible, and therefore 

(|/ s (k)| 2 ) ~ P(k) J |^|lT(k')| 2 = P(k) J d 3 r W 2 (r) = L 3 P(k) (3.19) 


where we have used Parseval’s theorem. This means that a fair approximate estimator of the power 
is 

P(k) ~ L _3 |/ S (k)| 2 . (3.20) 
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It is interesting to note that this estimator P( k) does not really converge to the true spectrum as 
the sample volume tends to infinity. This is because of the ‘grainy’ or ‘blotchy’ nature of f s already 
commented on (see figure [T2| . What happens is that the number of grains within some Unite volume 
of frequency space becomes larger as the volume increases, so the average of P( k) over this finite 
region will tend to the true value, but at the microscopic scale P(k) still fluctuates from point to 
point with ((P(k) — P(k )) 2 ) 1 / 2 ~ P(k). This means that one can simply estimate the fractional 
uncertainty in the estimated power simply by counting the number of independent cells. 

The above results largely apply for arbitrary sample volume W(r), and have a bearing on the 
design of experiments to measure the power spectrum for e.g. galaxy clustering. Say you can afford 
to sample a certain finite volume of space, but can choose how to lay out a set of survey fields. A 
contiguous cubic survey would give |W(k )| 2 as a 3-dimensional sine function with a ‘central lobe’ of 
width 5k ~ 1/L. Laying out the fields in a broader grid may be advantageous as this will decrease 
the width of the central lobe and increase the number of effectively independent samples of the power 
considerably. However, it will also tend to result in extended side-lobes in the ‘window function’ 
|W(k )| 2 and, if one is trying to measure power at low frequencies, the signal will be contaminated 
by power aliased from high frequencies. The mean value of the aliased power can be estimated and 
subtracted, but the fluctuations in the aliased power increase the noise in one’s measurement. The 
optimal sampling strategy depends on the actual power spectrum. 


3.5 Moments of the Power Spectrum 


Consider a 1-dimensional field f(r). The variance of the field is 

r fjb 

(f 2) = J ^ P(k)=m (3.21) 

so the variance is the zeroth moment of the power, and is also the auto-correlation function at zero 
lag. 

Higher moments of the power spectrum are also of considerable physical significance. Consider 
a random field F(r) which is the derivative of /(r): 

Hr) = f'(r) = f. (3.22) 

dr 

Taking the derivative in real space corresponds to multiplying by ik in Fourier space, so 

F(k) = ikf(k) (3.23) 


and the power spectrum of F is 

P F (k) = k 2 Pf(k). (3.24) 

This means that the variance of the gradient f is 

</ ,2 > = ( F 2 ) = J ^k 2 P(k) (3.25) 

which is the second moment of the power spectrum. We can also write this in terms of the auto¬ 
correlation function since if P(k) is the transform of £(r) then k 2 P(k) = —(ik) 2 P(k) is minus the 
transform of the second derivative of £(r) so 



(3.26) 


We can also compute such quantities as (//') which is proportional to J dk kP(k ), but for 
isotropic fields P(k) is an even function while k is odd, so (//') = 0. For such fields the covariance 
matrix for derivatives of order to and n is non-zero only if n — m is even. For example, 

the co-variance of the field and its second derivative is 


iff") = 



which, aside from the sign, is identical to (/ ,2 ). 


(3.27) 
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3.6 Variance of Smoothed Fields 


The power spectrum is very useful for computing the variance of fields which have been smoothed or 
filtered with a smoothing kernel W(r ), a common example of which being the point spread function 
of an instrument. 

If the smoothed field is 

fsi r) = J d 3 r' /(r - v')W(v’) (3.28) 

then, by the convolution theorem, its transform is 

fsi k) = f{k)W{ k). (3.29) 


The power spectrum of fs is then 


P fs (k) = \W(k)\ 2 P f (k) 


and therefore the variance of the smoothed field is 

</!> = / Hslir(k)l 2 P/(k). 


This is equivalent to (3.6) but is a single rather than double integration. 


(3.30) 


(3.31) 


3.7 Power Law Power Spectra 

Many physical processes give rise to fields with power spectra which can be approximated as power 
laws in temporal or spatial frequencies, 

P(lo) oc w n or P(fc) oc k n (3.32) 

where n is the spectral Index. As we have seen, an incoherent process in which the field values at 
different places or times are uncorrelated gives a flat or ‘white’ spectrum with n = 0. Spectra with 
indices n > 0 (with more power at high frequencies that is) are sometimes said to be blue while 
those with n < 0 are called red. Some examples are shown in figure |3.3| 

An example of a red spectrum is ‘Brownian noise’ obtained by integrating an incoherent process. 
Physical realizations include the ‘drunkards walk’, and the displacement of a molecule being buffeted 
by collisions in a gas. The displacement as a function of time has a spectrum P(uj) oc u> n with n = —2. 

Another example of a very red spectrum is the phase fluctuation in the wavefront from a distant 
source introduced by atmospheric turbulence. This is a two-dimensional field with spectral index 
n = -11/3. 

Note that for a process with spectral index more negative than minus the number of dimensions 
N, as is the case in the two above examples, the auto-correlation function £ is ill-defined since 
£(0) ~ f d N k k n and this integral does not converge at low frequencies. This is not usually a serious 
problem, since the power-law may only be obeyed over some range of frequencies, and there may 
be some physical cut-off (such as the ‘outer-scale’ in atmospheric turbulence) which renders the 
variance finite. In such cases it is more useful to define a structure function 

Si r) = <(/(r) - /(0)) 2 } = 2(£(0) - £(r)). (3.33) 

The infra-red divergence renders both £(0) and £(r) formally infinite, but the difference is well defined 
provided n > — (N + 2) and has a power-law form S(r ) oc 7 -~( JV +"). For atmospheric turbulence, for 
instance, IV = 2, n = —11/3 and the structure function for phase fluctuations is SMr) oc r 5//3 . 

One particularly interesting class of processes are so-called ‘flicker-noise’ processes with n = —N. 
For a temporal process, for instance, this would be n = — 1, and such fields are often referred to as 
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n = - 4 


n = - 2 



Figure 3.3: Examples of 2-dimensional random fields with power-law like power spectra. These 
were generated by first generating a white noise Gaussian random field - see below — and then 
smoothing it with a small Gaussian kernel of scale-length a. This smoothing introduces a coherence 
in the field. The Fourier transform of the white noise field was then multiplied by k n / 2 and the 
result was inverse transformed. The upper right panel shows 2-dimensional ‘flicker-noise’; such fields 
have the same variance on all scales. The lower left panel is ‘white-noise’. The lower right panel 
has n = 2 and is more uniform on large scales than white noise. Such fields, with n > 0 that is, 
have P(k) —> 0 as k —» 0. Since P(k) is the transform of £(r) this means that such fields have 
j cPr £(r) = 0. The auto-correlation function £(r) for our n = 2 examples has a positive peak of 
width ~ <j around r = 0, but is surrounded by a compensating ‘moat’ where £(r) is negative. The 
field values at pairs of points separated by a few times the coherence length are anti-correlated. 
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having a ‘1//’ spectrum, since the power spectrum scales inversely with frequency (/ —> to in our 
notation). In this case the contribution to the variance of the field is 

( / 2) = / ^PM~[ln(3.34) 

Such fields have the characteristic that there is equal contribution to the variance per logarithmic 
interval of frequency, and are often said to be ‘scale-invariant’. One can generate something approx¬ 
imating a flicker noise process by drawing a wiggly curve with some long coherence length and some 
rms amplitude, and then adding to this another wiggly curve with the same amplitude but with half 
the coherence length and so on. 

Physical processes that have been claimed to approximate flicker noise include: 

• The brightness of quasars as a function of time. 

• The intensity of classical music (when averaged over a time-scale much longer then the period of 
acoustic waves). This presumably reflects the interesting ‘hierarchical’ structure of such music, 
with notes of varying strength, phrases of varying strength, movements of varying strength. It 
is not the case for some other types of music. 

• The fluctuation in the resistance of carbon resistors as a function of time. 

• The large-angle fluctuations in the microwave background as revealed by the COBE satellite. 

• Seismic noise. 

• Deflection of light by gravity waves. 

• Telescope mirror roughness. 

• Variation in height of high tides over long periods of time. 

Not all of these are well understood (see Press article). 


3.8 Projections of Random Fields 

It is often the case that we observe a projection of some random field onto a lower dimensional 
space. Examples include the 3-dimensional galaxy density field projected onto the 2-dimensional sky, 
and the 3-dimensional atmospheric refractive index fluctuations projected onto the 2-dimensional 
wavefront. It is useful to be able to transform the power spectrum and auto-correlation function in 
the different spaces, either to predict the observed power, or, though this is generally more difficult, 
to de-project the observed power to reconstruct the power in the higher dimensional case. 

Consider a planar projection 


F{x,y) = J dz W(z)f(x, y, z) 


(3.35) 


where W(z) is a normalized ‘box-car’ function of width Az and height 1/A z, so (3.35) gives the 
average of f(x, y, z) through a slab. 

If the field /(r) has a well defined coherence length r c , and variance (/ 2 ), then one can crudely 
picture the field as a set of contiguous domains or cells of size ~ r c within each of which the field is 
assigned a constant, but randomly chosen, value with amplitude / ~ (f 2 ) 1 ^ 2 - In this model, and if 
the coherence length is small compared to the slab thickness, the projected field will be the average 
over N = A zjr c domains so we expect 


<^>^</ 2 >^</ 2 >. 


( 3 . 36 ) 
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The coherence scale of the projected field will be similar to that of the unprojected field, so we 
expect 

6d(0) “ £&d(0) (3-37) 

In the same spirit, one might model a more general field with some 3-dimensional correlation function 
? 3 d(?') as the superposition of components with coherence length r and (/ 2 ) ~ ^ 3 d(?'), for which we 
should have 

6m (r) — ^? 3 D (r)- (3.38) 

For a power law £ 3 d (r) oc r _73D then the 2-dimensional correlation function will also be a power 
law with j 2 D = 73 D — 1- This is all assuming the slab is thick. If the slab is small compared to the 
coherence length one expects £ 2 d(?") = $ 3D ( r ), so for a finite slab, one would expect the 2 -dimensional 
auto-correlation function to have a locally power behavior for both r <C A z and r A z, with slopes 
72D = 73D - 1 and y 2D = 73D respectively. 

This order of magnitude result can be made more precise. The 2-dimensional auto-correlation is 

6 d (x,y) = (Fix' + x,y' + y)F(x',y')) 

= fdzf dz'W(z)W(z')(f(x' + x,y' + y,z)f(x',y ', z')) (3.39) 

= / dzf dz'W(z)W(z')£ 3d(\A’ 2 + V 2 + ( z - z') 2 ) 

and if r = \Jx 1 + y 2 <C A 2 we find 

6m (r) ^ J dz' W 2 {z') J dz ^ 3 d(V r 2 + z 2 ) = J dz £ 3D (vV 2 + z 2 ). (3.40) 

This allows one to transform from 3-D to 2-D. For a power law, this integral gives 

6m (r) - ^6m(r) (3.41) 

provided 7 > 1 , which is just the condition that the slope should be such that £ 2 d(?") be a decreasing 
function of r. This agrees with the random-walk argument above. 

The generalization of this approach to 3-dimensional spherical geometry (with the observer at 
the origin) leads to what is called Limber’s equation. 

One can also obtain a similar transformation law for the power spectrum. Imagine one generates 
a 3-D random field as a sum of sinusoidal waves with appropriately chosen wavelengths A ~ 2n/k* 
and amplitudes ,/k oc y 7 P(k*). In projection, and assuming k*Az 1, most of these waves will 
suffer strong attenuation as positive and negative half cycles cancel one another. The only modes 
which are not attenuated are those such that the phase along each line of sight varies by less than 
a radian or so, or equivalently, those modes with k z Az 1. These modes have wave-vector nearly 
perpendicular to the line of sight through the slab. Thus projecting through a thick slab of thickness 
A 3 : in real-space has the effect of selecting modes in a narrow slice 5k z 1/A 3 in Fourier space. 
One therefore has the simple result that 

P 2D (fc) ~ P 3D (k)/Az (3.42) 

where the constant of proportionality 1 /A 3 is consistent with the requirement that Pnd should have 
dimensions of (length)^. 

For a power law spectrum P oc k n the slope is the same in both three and two dimensions. This 
is in accord with the reasonable requirement that if the field is incoherent in 3-D, so n = 0, then the 
projected field should also be incoherent. It is also in accord with the results for transforming the 
auto-correlation function above since if P(k) = P*(fc/fc*) n then 

= J ^^P(k)e lk r ~ P+k- n r~^ J d N y y n d‘ (3.43) 

The integral here is dimensionless and generally of order unity, so this tells us that a spectral index 
n corresponds to a correlation function slope 7 = —(n + N) so, for instance, 72 D = 73 D +1 as before. 
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These results are true for a wide range of spectral indices, though certain values such as n = 0 are 
special cases (n = 0 does not correspond to £30 oc r~ 3 for instance). What happens in this case is 
that the value of the dimensionless integral vanishes. For the integral to have a finite low-frequency 
contribution requires n > —N. As discussed, for redder (ie more negative) indices, £ is ill defined, 
but the structure function 

f rjN b 

S(r) = 2(£(0) - £(r)) = 2 J ^P(k)(l - e ikr ) (3.44) 


is well-defined providing n > — (2 + N). 


3.9 Gaussian Random Fields 

A significant special class of random fields are Gaussian random Helds , examples of which are legion. 
Gaussian statistics arise whenever one has a random process which is the sum of a large number 
of independent disturbances. Gaussian random fields also arise from the quantum-mechanics of the 
early universe, and such fields play a major role in cosmology. 


3.9.1 Central Limit Theorem 

Consider a random variate Y which is the sum of a large number of random components 

N 

y = Y, x ( 3 - 45 ) 


where the X values are drawn randomly and independently from some probability distribution 
function p(X). The central limit theorem states that for large N, and provided p(X) satisfies 
certain reasonable conditions, the probability distribution for Y tends to a universal form 


p N (Y)dY = ———exp(—F 2 /2crf-) 
v27rcry 


(3.46) 


where 

< 7 y = No\ and a 2 x = J dX X 2 p(X). (3.47) 

We can prove this by induction. Let Y denote the sum of N random X values, and Y' the partial 
sum of the first N — 1 values, so F = Y' + X]y. The probability of some Y is the sum over all Y' of 
Pat-i (Y’) times the probability that Xn =Y — Y', or 


Pn(Y) = j dY' p N -\(Y')p(Y - Y'). (3.48) 

This is a convolution, so the Fourier transform of the probability distribution, also called the gener¬ 
ating function , is 

Pn(u) = Pn-i(u)p(u) (3.49) 

and since for IV = 1 we have pi(u>) = p(u>), 

Pat(w) = p{u) N • (3.50) 


This means that pn(u>) is maximized where p{oS) is maximized, ie at w = 0, but will tend to be 
much more tightly peaked, and so will only depend on the form of p{ui) very close to the origin. 
Expanding the complex exponential factor in the Fourier transform p(u>) for wCl /<Jx gives 


p{u)= dX p(X)e tulX = / dXp(X)(l + icoX-u 2 X 2 /2 +...). 


(3.51) 
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The first term here is unity. The second term is iu>{X) = 0. The third term is —lu 2 (X 2 )/2 = 
—lo 2 o\/2 and therefore 


p(u>) ~ 1 — uj 2 Ox/2 for 

ujox -C 1 

(3.52) 

Raising this to the A T tli power gives 



Pat(w) = p(u) N = (1 - lv 2 o 2 x /2) n - 

* exp(— uj 2 Nox/2) 

(3.53) 

provided N is large. Performing the inverse transform px 

—> px we obtain 



Pn(Y) = —f 2 ^ N(T2 exp(-y 2 /2iVc4) (3.54) 


QED. 


3.9.2 Multi-Variate Central Limit Theorem 

Consider a random vector Y = {Y t . Y 2r ... Y m } which is the sum of N random vectors X 

Y = ^X. (3.55) 

N 

The generating function is now 

p(u) = j d m X p(X)e ia; ' x . (3.56) 


Expanding the exponential gives 

e iu yL = 1 + iuiXi - UiXiUjjXj /2 ... (3.57) 

so assuming we have, as before, chosen the origin such that (X) = 0, we have 


p(uj) = 1 — rriijUJiUij /2 + ... ( 3 . 58 ) 

with 

rriij = J d m X XiX jP {X ) = {X.Xj) ( 3 . 59 ) 

the covariance matrix for the X vectors. The same argument as above gives the generating function 
for Y as 

Pn{u) ~ (1 — mijUJitOj /2) n = exp(—MijU>iijjj/2) ( 3 . 60 ) 

with 

Mij = N rriij ( 3 . 61 ) 

and inverse transforming pn(ui) gives 

MY)dmY= 7mm exp (-^^T/2). (3.62) 

This is most easily verified by working in a ‘rotated frame’ such that ARj is diagonal, but is true in 
general since the determinant M and the quadratic form YjMY 1 Y 3 are invariants. 

This calculation shows that the probability distribution function for a Gaussian random vector 
Y is fully specified by the matrix of covariance elements M, t j = (Y)Y)). 
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3.9.3 Gaussian Fields 

A 1-dimensional Gaussian random field f(r) can be thought of as a very large vector f = {/i, / 2 , fs, ...} 
giving the values of the field f m at a set of very finely spaced set of points r m = mAr. 

If the field is statistically homogeneous then the correspondingly huge covariance matrix is easily 
generated since ( f n fm) = 5(l n — m|Ar). 

A convenient prescription for generating a Gaussian random field on the computer is to make a 
Fourier synthesis 

/(d = E^ eifcr ( 3 - 63 ) 

k 

with randomly chosen Fourier amplitudes fk with \fk\ oc \JP(k). These are complex, but must 
satisfy f-k = ff to ensure reality of /(r). This prescription clearly satisfies the conditions for the 
CLT that /(r) be a sum of independent random variables, and it is not particularly critical precisely 
what distribution function is used to generate the fk values. 

A key feature of a Gaussian random field is that all of its properties are completely specified by 
the two point function £(r), and therefore by the power spectrum P(k). This is in contrast to a 
general random field where only all variance statistics of the field are thus specified. Two general 
random fields may have the same two-point function but their higher order correlation functions may 
differ. For Gaussian random fields, all higher order statistics are uniquely specified by the two-point 
function £(r). 

The above results are readily generalized to fields in multi-dimensional spaces. 


3.10 Gaussian iV-point Distribution Functions 


Given the power spectrum or auto-correlation function one can simply write down the fV-point 
distribution function for a GRF: 

p(fi,f2...f N )d N f (3.64) 

or indeed for any set of linear functions of the field (including derivatives of arbitrary order etc). 
All that one needs to do is compute the N x N covariance matrix elements, each of which can be 
written as an integral over the power spectrum. 

For example, the 2-point distribution function p(f\, ff) involves the 2x2 covariance matrix 


Mij — 

The determinant is \M\ = 


(/l 2 ) 

(/ 1 / 2 ) ' 


' 5(0) 

5(0 ’ 

. (/ 2 / 1 ) 

(fi) \ 


. dr) 

5(0) . 

and the 

inverse is 




M~ x = 

1 

£0 

fr ' 


*7 

~ _ £2 
SO Sr 


5o 



so the bi-variate probability distribution is 

dfidf 2 


>(/i,/ 2 ) dhdf 2 = 


271 V5o ~5r 


exp - 


6/i 2 -2^/i /2 + &/f 

m-e r ) 


(3.65) 


(3.66) 


(3.67) 


Note that if £ r tends to zero for large r then then bivariate distribution function factorizes 
p{f 1 J 2 ) p{fi)p{f 2 )- 


3.11 Gaussian Conditional Probabilities 

The conditional probability for the value of the held / 2 at r 2 given a measurement of the held fi at 
7*1 is, by Bayes’ theorem, 


P(/ 2 |/i) =P(/i,/ 2 )M/i) 


(3.68) 
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or 


PC/ 2 I/ 1 ) 


1 ( {f2-chf\ 

72 ^ 0 ( 1-0 XP l 2^(1 -<*)) 


where c = 6/?o- 

This is a shifted Gaussian distribution with non-zero conditional mean 


(3.69) 


(/ 2 I/ 1 ) = c/r (3.70) 

so the mean value is close to the conditional value for 7*12 <C r c and relaxes to zero for ri 2 r c . 
The conditional variance is 

((/2-c/l) 2 |/r)=6(l-c 2 ) (3.71) 

which is much smaller than the unconstrained variance for 7*12 <C r c . 


3.12 Ricean Calculations 


Imagine one is monitoring a noisy radio receiver for pulses and one would like to estimate the 
frequency of spurious pulse detections simply due to the noise in the output. This problem first 
arose in telegraphy, and the solution to such problems was first clearly set out by Rice. 

A classical ‘Ricean’ calculation is to ask, given a Gaussian random time series f(t), with some 
specified auto-correlation function £(t) = (f(t)f{t + T )}, what is the frequency of up-crossings of the 
level / = FI 

To answer this, consider the bi-variate probability element 

dp = p(f = F , f')df df' (3.72) 


with /' = df /dt, the time derivative of the field. This quantity tells us the fraction of time that the 
following conditions are satisfied 

F < f < F + df 


/'<!</' + df 


(3.73) 


These conditions individually specify a set of intervals of time; the former of length At ~ df /\J (/ ,2 ) 

and the latter of length At' ~ df'/\J (/" 2 ), and the combined conditions are satisfied within the 
intersection of these interval sets. Let us choose the infinitesimals so that At <C At' (ie df is an 
infinitesimal of higher order than df), so that the length of the interval is determined by the first 
condition, and 

At = df/f. (3.74) 

Define n up (F, f)df to be the frequency (number per unit time) of up-crossings with f < df /dt < 
f + df. Multiplying this by the length of the intervals must equal the probability element above: 


n up (F, f)dfAt = p(f = F, f)dfdf 


and therefore 


^up 


(F,f) = \f\P(f = F,f) 


and the total rate of up-crossings is obtained by integrating over all f > 0: 


OO 

n up (F) = J df\f\P(F,f). 


(3.75) 

(3.76) 

(3.77) 


This integration is straightforward. As we have seen, (//') = 0, so the joint distribution factorizes 
into two independent Gaussians p(f, f) = p(f)p(f) with 

P(f) = (2tt 6) -1/2 exp(— / 2 /2£o) 
p{f) = ( 2 tt(— 6 ))- 1 / 2 exp(—/' 2 / 2 (— 6 ')) 


(3.78) 
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and hence 

n u P {F) = ^ y ^oT" exp (- f2 /^(°))' (3 - 79) 

The curvature of £(r) at r = 0 therefore determines the characteristic time-scale r* = \/^o/Co 
and the rate of up-crossings is essentially this rate times the probability that the field has value 
f=F. 

This type of calculation can be generalized to fields in multi-dimensional space, and can also be 
generalized to give the frequency of extrema, or of peaks, the latter playing a big role in cosmological 
structure formation theory. 

As a second example, consider the distribution of heights of extrema. At first sight this is trivial, 
since this is surely just 

Pext{F) = p(f = F\f = 0) = p(f = F,f= 0 )/p(f = 0) (3.80) 

right? And since the field and its first derivative are uncorrelated iff') = 0, so p(f = F, f' = 0) = 
p{f = F)p{f = 0) and therefore 

Pe*t(F) = p(f = F). (3.81) 

This is a very simple result, but also very puzzling, as it says that the distribution of field values at 
extrema is the same as the unconstrained distribution of field values, whereas common sense seems 
to indicate that the distribution of extremal values should be, well, more extreme. To bolster one’s 
confidence in this intuitive feeling consider the case of band limited noise with power spectrum P{oS) 
which vanishes outside of some narrow range of frequency of width Sw around some central frequency 
u> o- A realization of such a process takes the form of a locally sinusoidal wave of frequency Wo, and 
with slowly varying envelope. Over some interval of length 5t <C 1/Sco, where the peak amplitude 
has some nearly constant value / max , the mean square value of the field is /max /2 whereas the mean 
square value of the field at extrema is just f^ ax which is surely inconsistent with p ex t(F ) = p(f = F ). 

To see what is wrong with the above analysis one needs to examine more carefully what is meant 
by p{f = F. f = 0). By itself, this is quite abstract, but multiplied by infinitesimals df , df the 
meaning is clear: 

p(f = F,f = 0)dfdf (3.82) 

gives the fraction of time that the field and its derivative lie in the prescribed ranges, or the fraction 
of time occupied by a set of intervals. Now in the vicinity of an extremum at to say the value of 
the derivative is /' = 0 + (t — so the the length of the interval will be inversely proportional 

to the second derivative 6t = df/f", which in turn is (anti)-correlated with the field /. The simple 
conditional probability p(f = F\f = 0) gives the distribution of field values near extrema, but it in 
a way which gives more weight to those extrema with small f". 


3.13 Variance of the Median 

Consider the common situation: One has obtained a set of CCD images of some object, which one 
would like to average in order to beat down the noise. Unfortunately, the images contain not just 
an approximately Gaussian noise component, arising from photon counting statistics, but also a 
highly non-Gaussian noise component coming from cosmic rays, so one is tempted to take a median 
of the images rather than a straight average (which, in the absence of cosmic rays, and assuming 
homogeneous data, would be optimal). What is the penalty in terms of final variance for taking the 
median rather than the average? 

We wish to compute the variance of the median of N independent random variates x. Denote the 
parent probability distribution by p(x) and the cumulative distribution by P{x) = // dx' p(x'). 
Without loss of generality we can take the origin in x-space to lie at the median of the parent 
distribution so that P(0) = 1/2. 
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Consider, for simplicity, the case of even N. The probability that the first N/2 samples lie below 
some value x while the last N/2 lie above x is just 

PmedianO) = (1 ~ P(x)) N/2 P(x) N/2 (3.83) 

To get the total probability (that N/2 samples lie below x) we sum over all combinations. This 
introduces factorials, but these are independent of x so the final result is proportional to the above 
factor. 

For large N we expect the median of N samples to lie very close to the median of the parent 
population, so we make a Taylor series of p(x) around x = 0: 

p(x) ~ po + p' 0 x + l^PoX 2 + . (3.84) 

with corresponding expansion for the cumulative distribution P(x) = 1/2 + J ( ' dx p(x) or 

P{x) ~ ^ + p 0 x + ^p' 0 x 2 +. (3.85) 

Keeping the leading order terms gives 

Pmedian(a;) oc (1 + 2p 0 x) N/2 (l - 2 p 0 x) N/2 = (1 - \p\x 2 ) N ! 2 ~ exp(-2Nplx 2 ) (3.86) 


where in the last step we have assumed N is large. 

In this limit the distribution is a Gaussian: p me dian(£) oc exp(—£ 2 /2cr^ edian ) with median variance 


1 


^median 


4 Npi 


(3.87) 


which depends only on po = p{x = 0). This is an example of a Gaussian distribution arising in a 
case where the central limit theorem does not apply. 

For the special case of a Gaussian parent distribution with variance cr 2 we have po = 1/\/2 tt(Tq 
so the median variance is 

- „2 

(3.88) 


a- 2 = - x ^ 

^median 2 jy ' 


which states that the median variance is 7r/2 times larger than the variance of the mean. 


3.14 Problems 

3.14.1 Radiation autocorrelation function 

Radiation from a thermal source at temperature T is passed through a band pass filter with (energy) 
transmission function T (tu) = 1 /w 2 for ojq — Suj < u> < u>q + 5u> and falls on a sensitive detector which 
measure one component the electric field E(t) as a function of time. Assuming that (jJq + Suj <C kT/h, 
what is the normalised autocorrelation function of the measured electric held Ce{t) = {E(t)E(t + 
r))/{E{t) 2 ). 
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Chapter 4 

Properties of Electromagnetic 
Radiation 


In this chapter we introduce the specific intensity and other quantities that describe the energy 
flux associated with electromagnetic radiation. We show the relation between the intensity and the 
energy flux, momentum flux, radiation pressure and energy density. We discuss the constancy of the 
intensity along light rays. 


4.1 Electromagnetic Spectrum 


Astronomical observations mostly deal with electromagnetic radiation. Refraction, diffraction and 
interference phenomena indicate that this radiation behaves as waves with wavelength A and fre¬ 
quency v related by 

A = c/v (4.1) 


with c the speed of light 


c = 3 x 10 10 cm/s. 


(4.2) 


The photo-electric effect shows that energy is given to, or taken from, the radiation field in discrete 
quanta — photons — with energy 

E = hv (4.3) 

where h is Planck’s constant 

h = 6.6 x 10 _27 erg s. (4.4) 

Of great importance is thermal radiation as emitted by matter in thermodynamic equilibrium, for 
which the characteristic photon energy is related to the temperature of the emitting material by 


T = E/k 


(4.5) 


where k is Boltzmann’s constant 


k = 1.38 x 10 _16 erg/K. (4.6) 

Astronomers deal with radiation covering a wide range of frequencies, conventionally delineated 
into various regimes: 

• Gamma rays: T i> 10 lo K 

• X-rays: 10 9 K £ T & 10 6 K 

• UV: 10 6 K £ T <> 10 5 K 

• Optical: T ~ 3 x 10 4 K 
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Figure 4.1: Illustration of the geometry for defining the specific intensity 

• IR: 10 4 K <; T <; 10 2 K 

• Radio: llRK <,T<, lO" 1 !! 


4.2 Macroscopic Description of Radiation 


4.2.1 The Specific Intensity 

The macroscopic properties of freely propagating radiation are well described by geometric optics in 
which radiative energy travels along ‘rays’. One can also picture this as streams of photons carrying 
energy moving ballistically. The flow of energy along these rays is the radiative flux. 

In general the radiative energy flux depends on direction 12 and on frequency v (or wavelength 
A). We define the monochromatic specific intensity /„(12) (also known as the brightness or surface 
brightness) such that if we erect a small surface dA perpendicular to the rays propagating in some 


small range of directions 412 (as illustrated in figure 4.11 then in time dt 7 those rays in a range of 


(4.7) 


frequencies dv around v will transport through the surface an amount of energy 

dE = /„(12) cL4 dt 412 dv. 

The units of the intensity are therefore erg/cm 2 /s/steradian/Hz. 

The intensity provides a fairly complete description of the transport of energy via radiation. It 
can be generalised to describe how the energy is distributed between the different polarization states 


(see (7.5.3). 


4.2.2 Energy Flux 

Consider a surface element with some arbitrary orientation, and rays in some small cone 412 around 
direction 12 as illustrated in figure |4.2| The area of the surface projected onto a plane perpendicular 
to the rays is 4Tj_ = cL4cos0, where 9 is the angle between the rays and the normal to the surface, 
so the differential flux (energy per unit area per unit time) for this range of directions is 

4F„ = I u cos (9) 412. (4.8) 

The net flux is obtained by integrating over direction 

F v = j dfl 7„(12) cos 9. (4.9) 

Multiplying some function of direction by the nth power of cos 9 and integrating is often termed 
‘taking the nth moment’ of the function, so we can say that the net flux is the first moment of the 
intensity. Note that the net flux is zero for isotropic radiation. 

Note also that the energy flux is not truly intrinsic to the radiation field, since it depends on the 
orientation of the surface element. Energy propagating ‘downwards’ through the surface counts as 
negative energy flux. 
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Figure 4.2: Illustration of the geometry of the area and bundle of rays for calculating the contribution 
to the net flux. 

4.2.3 Momentum Flux 

Photons carry momentum p = n E /c, where n is a unit vector in the direction of the photon motion, 
so the component of the momentum in the direction dA normal to a surface element is p± = \p\ cos 6 
and so the differential momentum Bux is dF v cos 9 and integrating over all directions gives the 
momentum Bux 

P v = - J dd cos 2 9 (4-10) 

so the momentum flux is the second moment of the intensity. If the surface element is perpendicular 
to the a:-axis, for example, this quantity gives the rate at which the x component of momentum is 
being transported in the positive x direction through the surface. 

Perhaps surprisingly, the momentum flux does not vanish for isotropic radiation; rays propagat¬ 
ing up (down) are considered a positive (negative) flux of particles, but carry positive (negative) 
momentum. 

All the above quantities are monochromatic and refer to a single frequency. One can obtain the 
corresponding total quantities by integrating over frequency 

1(D) = J dv I V (D) etc... (4-11) 

We have used subscript frequency above. One can also define analogous quantities with subscript 
A such that e.g. F\ is the energy flux per unit range of wavelength around A. These are related by 

e.g. 

I v dv = I x d\ (4.12) 

with A = c/v —> dA = —c/v^dv so 

I x = v 2 I v /c. (4.13) 

4.2.4 Inverse Square Law for Energy Flux 

Consider an isotropically emitting source. The net rate at which energy crosses an enclosing sphere 
is independent of the radius of the sphere, and is equal to the product of F and the area, hence 

Focl/r 2 . (4.14) 


4.2.5 Specific Energy Density 

We define the specific energy density as 


u v ( O) = energy/volume/solid angle/frequency. 


(4.15) 
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Figure 4.3: Illustration of the relation between the specific intensity and specific energy density. 


This is proportional to the intensity. To determine the constant of proportionality, consider a cylinder 
of length dl and cross-section area dA as illustrated in figure 4.3 The energy enclosed (and in range 
of direction dfl around the axis and in dv around v) is 


dE = u v (0)d£ldldAdv 


(4.16) 


and this energy will pass out of the cylinder in a time dt = dl/c. However, we also have I v = 
dE / (dtdAdOdis) and hence 

u„(Sl) = I„(fi)/c. (4.17) 

This is reasonable; since all photons move at the same velocity c, the rate at which photons in a 
certain range of directions cross a perpendicular surface is just equal to the photon number density 
times the speed of light. Similarly, the energy flux for rays in this range of directions — i.e. the 
brightness — is equal to the energy density times c. 

Integrating over direction gives 

Mi, = J dfl u„(fi) = - J dLl (4-18) 

so the energy density is proportional to the zeroth moment of the intensity. Equivalently 

4 : 7 r 

u v = — J v (4.19) 

c 

where 

J v = 4 [ dfl IJQ) (4.20) 

47T J 

is the mean intensity. 


4.2.6 Radiation Pressure 

For isotropic radiation the radiation pressure is 

P = u/ 3. (4.21) 


The simplest way to see this is to consider a single photon bouncing off the walls of a perfectly 
reflecting box, compute the rate of transfer of momentum to one of the walls, and then average over 
all possible directions for the photon (see figure 4.41. 

More formally, though also more generally, the rate at which photons of a certain direction 
bounce off an element of the wall is given by dividing the energy flux by the energy per photon 


dn /„ cos OdAdLldv 
dt hv 


(4.22) 
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Figure 4.4: Imagine a box with perfectly reflecting walls of side L containing a single photon of energy 
E. When the photon reflects off the wall at the right the transfer of momentum is A P = 2{E/c ) cos 9. 
The time between reflections off this wall is At = 2L/(ccos0). The rate of transfer of momentum to 
the wall per unit area — i.e. the pressure exerted by the radiation — is A P/(L 2 At) = (E/L 3 ) cos 2 9 = 
u cos 2 9. For isotropic radiation, (cos 2 9) = 1/3, so P = u/ 3. 


Here 9 is the angle between the photon direction and the normal to the wall. In each such reflection 
the x-component of the momentum is reversed, giving a momentum transfer to the wall A p = 
2{E/c) cos 9, so the rate at which momentum is transferred to the wall per unit area per unit 
frequency is 

P„ = f°r Ce = - [ dfl I u cos 2 9. (4.23) 

dAdv c J 

cos 0>O 

The radiation pressure is therefore proportional to the second moment of the intensity. Integrating 
this over frequency gives the total radiation pressure. For isotropic radiation this is 


P = 


dv P„ = 


diy J l ,2tt 



u 

3 


(4.24) 


so the radiation pressure is one third of the energy density. 

Another way to think about radiation pressure is to consider the change in energy of a box 
containing standing waves of radiation if we slowly change the volume of the box. If we make a 
fractional decrease e in the linear size — so L — > L' = L( 1 — e) then the wavelength scales in 
the same way, and so the energy of each quantum increases as hi' —> hv’ = hv/( 1 — e) ~ hv{ 1 + e) 
(ignoring terms of order e 2 and higher). Thus, if the number of quanta are conserved then the energy 
also increases by this factor, and the change in energy is A E = eE. This increase of energy had to 
come from work done in compressing the box against the radiation pressure. Since there are 6 walls, 
and each moves a distance eL/ 2, we have dW = eE = 6PL 2 x eL/2 and hence P = E/(3L 3 ) = u/ 3. 


4.3 Constancy of Specific Intensity 


Consider two circular disks of areas dA\ and dA 2 with separation R and let both disks be normal 


to their separation (see figure 4.5 ). Consider those rays which pass through both disks. The rate at 
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Figure 4.6: An illustration of how intensity is conserved even under extreme focusing of rays. The 
intensity of light coming into the telescope is I\. The energy per unit time entering the telescope 
pupil is equal to /iA pup iidfl where dO is the solid angle of the source. In front of the image (of area 
Amage = L 2 dO) the intensity is I 2 and the radiation is extended over a solid angle flbeam = A pupi i/L 2 . 
Equating the rate at which energy is deposited on the focal plane with the rate energy enters the 
pupil tells us that I 2 = I±. 


which energy passes through the disks can be written as 

dE 

— = RidAidOidv = hidA 2 d0 2 di' (4-25) 

dt 

but the solid angle of the cone of rays is dfli = dA 2 /R 2 and dfl 2 = dA\/R 2 so 

hi = hi (4.26) 


so the specific intensity is conserved. 

This is consistent with the inverse square law for the energy flux from a isotropically emitting 
source, since the energy flux is the product of the intensity and the solid angle of the source, and 
the latter varies inversely with radius squared. 

This is for radiation propagating in free space. It also applies to radiation propagating through 
(static) lenses. A telescope cannot change the intensity; what it does is increase the apparent solid 
angle of the object and this increases the energy flux falling on the detector (see figure 4.6). 

One consequence of the constancy of specific intensity is that we can readily detect even very 
distant objects such as galaxies, clusters of galaxies, provided they are big enough to be resolved. 
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Note that for cosmologically distant objects the surface brightness is not conserved; rather they 
suffer from (1 + z) 4 dimming due to the red-shifting of the radiation. This is not in conflict with the 
law of constancy of intensity for a static system since an expanding Universe is not static. 

Constancy of intensity is also invoked in Olber’s paradox which says that in an infinite Universe 
any line of sight should end on a star, so the sky should be bright, not dark. The ‘resolution’ of this 
non-paradox involves the finite age of the Universe — there is generally a ‘horizon’ beyond which 
we cannot see — and also the (1 + z) 4 dimming mentioned above. 

We will see that the surface brightness of black body radiation is proportional to T 4 . If a static 
optical system could change the intensity, this would be equivalent to changing the temperature of 
the radiation passing through the system. This difference in temperature would allow one to extract 
useful work from a thermal system, which is forbidden; it would allow one to make a perpetual 
motion machine. 
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Chapter 5 

Thermal Radiation 


We now consider thermal radiation, by which we mean radiation emitted by matter which is in ther¬ 
mal equilibrium, and black body radiation, which is radiation which is itself in thermal equilibrium 
(e.g. the radiation inside a kiln, or deep inside a star). 

The standard gedanken apparatus for generating black-body radiation is a cavity with perfectly 
reflecting walls and containing some specks of matter which can absorb and re-radiate the photons 
and thereby allow the radiation to reach equilibrium. (Since photons are massless and do not carry 
any conserved quantum numbers they can be created or destroyed in the interaction with matter). 
Our task here is to find the equilibrium distribution of photon energies. 

The resulting intensity will, of course, depend on how much energy we put in the cavity. The 
equilibrium intensity must be isotropic and homogeneous in space, and so should therefore depend 
only on the energy density. The intensity, which we will denote by /„ = B V (T), must be a universal 
function of frequency parameterized only by the temperature T. Were this not the case — i.e. if the 
equilibrium intensity were to depend on some other parameter (such as a magnetic field say) then it 
would be possible to extract useful work from an equilibrated system (see RL), but this is forbidden 
by the second law of thermodynamics. 

Several important properties of black-body radiation — the Stefan-Boltzmann law, the thermo¬ 
dynamic entropy, and the adiabatic expansion laws — can be deduced from purely thermodynamical 
considerations. We review these results first in §5.1[ We then apply statistical mechanical consid¬ 
erations to derive the detailed form of the Planck spectrum in §5.2| All of the results derived 
thermodynamically can, of course, be obtained from the Planck spectrum. 


5.1 Thermodynamics of Black Body Radiation 

5.1.1 Stefan-Boltzmann Law 

Consider a cylinder containing radiation (and a speck of matter in order to allow the radiation to 
equilibrate), with the possibility of heat input from some external source, and with a piston to 
connect the radiation mechanically to the outside world and allow the radiation to do work on its 
environment or vice versa. See figure [57Lj 

The first law of thermodynamics relates changes dU in the total energy U, heat input dQ and 
mechanical work done by the system on the outside world PdV: 


dU = dQ — PdV. 


The change in the thermodynamic entropy of the system is 


dS = 


dQ 


dU PdV 
+ T 


(5.1) 


(5.2) 
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Figure 5.1: A cylinder with perfectly reflecting walls (and perhaps a speck of dust to allow the 
radiation to thermalize) contains black-body radiation. The piston allows the radiation to interact 
mechanically with the outside world, and there is also an external source of heat. 


with U = uV, P = u/3 and energy density u = u(T) we have dU = d(uV ) = udV + Vdu and 
therefore find 


dS = 


V du\ 
TdTj 


dT 


4 u 
3 T 


dV. 


(5.3) 


However, since the total entropy is a function only of the temperature and volume, we also have 

(5.4) 




Equating the coefficients of dT and dV in (5.31 and (5.41 shows that the partial derivatives of 
the entropy are dS/dT = (V/T)(du/dT) and dS/dV = (4/3 ){u/T) and equating d 2 S/dVdT and 
d 2 S/dTdV yields 

dii dT 

V 7“ (5 - 5) 

which has solution u oc T 4 (we are assuming that u —> 0 as T —> 0). Introducing a constant of 
proportionality a gives us the Stefan-Boltzmann law 


u = aT 4 . 


(5.6) 


The energy density is related to the brightness by u = HtB/c so we have 


B„{T) = 


acT 4 
4n 


as an alternative statement of the Stefan-Boltzmann law. 

Finally, the emergent flux density from a black-body radiator is 


F = 


dLl B cos 0 = irB 


'cos 0>O 


(5.7) 


(5.8) 


F = aT 4 


and so 


(5.9) 
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where the constants of proportionality in these various versions of the Stefan-Boltzmann law are 

a = — = 7.56 x 10 -15 erg cm^KT 4 (5.10) 

c 

and 

<7 = = 5.67 x 10 _5 erg cm^R-V 1 . (5.11) 

These were originally determined empirically. Below we shall derive them in terms of the fundamental 
constants h, c. 


5.1.2 Entropy of Black-Body Radiation 

Start with a cold cavity, with negligible energy, and apply heat at constant volume. The entropy 
can be obtained by integrating dS = dQ/T: 

S = J d ^ = V J^=aV J C ^=4aV JdTT 2 (5.12) 

or 

S=^aVT 3 . (5.13) 

The entropy scales with the volume, as one could easily have concluded from the fact that Q is an 

extensive quantity, and it scales as the cube of the temperature. 

The radiation energy is U = aVT A . If we assert that the typical photon energy is E ~ kT then 
the number of photons in a black-body cavity is 

N ^ a ^ = (a/k)VT 3 (5.14) 

so, for black-body radiation, the entropy is just proportional to the number of photons. 


5.1.3 Adiabatic Expansion Laws 

We can now compute how the temperature and radiation pressure vary if we change the volume 
while keeping the system thermally isolated from the external world (dQ = 0). At constant S we 
have 

TcxV~ 1/3 and P oc T 4 oc V~ A/3 (5.15) 

The adiabatic equation of state for black-body radiation radiation is therefore 

PV 7 = constant (5.16) 


with adiabatic index 7 = 4/3. 

The above results are consistent with the picture of the radiation as a conserved number of 
photons, with wavelength scaling as the linear dimension of the cavity A oc L oc E 1 / 3 , so the energy 
of each photon scales as E oc hv oc 1/L, and since the characteristic energy is E ~ kT this means 
T oc E oc V- 1 / 3 . 

This tells us that black-body radiation remains black-body under adiabatic expansion — it does 
not require any matter to maintain this form. 


5.2 Planck Spectrum 

We now derive the Planck spectrum B„(T). This involves two steps: first we compute the density 
of states as a function of frequency, and then we compute the mean energy per state using the 
Boltzmann formula. After discussing what this means in terms of occupation numbers, we combine 
these to obtain the energy density u v (Q) and B„(T). Finally, we discuss some general properties of 
the Planck spectrum and the various ‘characteristic temperatures’ that are observationally useful. 
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Figure 5.2: Left panel shows an idealised cubical box of side L. Also indicated is a standing wave 
(actually the fundamental mode). On the right is the corresponding lattice of states in wave-number 
space. As the box becomes very large, the spacing of the states becomes very fine. The number of 
states in some volume A k 3 of fc-space is 2 x L 3 Ak 3 / (2 tt) 3 . 


5.2.1 Density of States 

Consider a large cubical box of side L with perfectly reflecting walls, as illustrated in figure [5~2| The 
states of the electromagnetic field consistent with these boundary conditions are a set of standing 
waves, labelled by a set of three numbers ( n x ,n y ,n z ) giving the number of periods in the x, y , and 
z dimensions respectively. Equivalently, since L is considered to be constant here, the states can be 
labelled by the wave-number vector k = (27r/L)n which has amplitude |k| = 2ir/\ = 2ixv/c. The 
momentum of a photon with this wavelength is p = hk/2n = hk. 

These standing wave states form a regular cubical lattice in k-space with spacing dk = 2i t/L. As 
L —> oo, the spacing dk shrinks to zero. 

Consider a finite volume in wave-number space (Ak) 3 . The number of states in this volume is 

N states = 2 x (Ak) 3 /dk 3 = 2L 3 (Ak) 3 /(2n) 3 (5.17) 

where the factor 2 arises because there are two polarization states for each allowed oscillation mode. 

The number of states Abates is proportional to the product of the real space volume element 
V = L 3 and the fc-space volume element (Ak) 3 . There is therefore a well defined and constant 
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density of states in 6-dimensional space: 

-^states _ ^ 

volume x k — volume (27r) 3 


(5.18) 


Consider the modes with wave normal k in some element dLl of solid angle and with |k| in the 
range k to k + Ak. The fc-space volume is k 2 dkdd = (2n) 3 v 2 dvdU/c 3 , so the number of states in this 
volume element is Abates = 2L 3 v 2 dvdLl/c 3 and therefore the density of states (in volume-frequency 
space) is 

p = 7V states /volume/solid angle/frequency = 2v 2 /c 3 (5.19) 


5.2.2 Mean Energy per State 

The next step is to compute the mean energy for an oscillator of a given spatial frequency k (and po¬ 
larization). According to elementary quantum mechanics, a simple harmonic oscillator has quantized 
energy levels 

E= (n + l/2)hv. (5.20) 

Henceforth we shall ignore the constant hv/2 ground state energy. 

These oscillator states can be thought of as distinguishable particles, and we can therefore apply 
the Boltzmann formula to say that the probability to find an oscillator with energy E is proportional 
to exp {—(3E) = exp (—E/kT). The mean energy is 


E Ee~P E 

E=^ - 

E e-W 

n —0 

but the sum here is a simple geometric series 


d_ 

dp 


ln^V^. 

n =0 


from which we obtain 


oo oo 

e-ps = J2 e ~ nhvP = (1 - e-^")- 1 

n—0 n—0 



- hve~P hv 
E — -777- 

\ _ g—phv 


hv 

gfihv _ ^ ‘ 


(5.21) 


(5.22) 


(5.23) 


5.2.3 Occupation Number in the Planck Spectrum 

If we divide the mean energy E by the energy per photon hv we obtain the mean occupation number 


1 

gphu _ 7 


(see figure 5.3). This has the following asymptotic dependence on frequency: 


(5.24) 


• hv <C kT: expanding the exponential as a Taylor series the leading order term is n ~ kT/hv 
1. In this ‘Raleigh-Jeans’ region the occupation number is large — each state contains a large 
number of photons, and photon discreteness effects are negligible. The occupation number 
diverges as v —> 0, but the mean energy remains finite E — kT. 


• hv ~ kT: this is the characteristic photon energy, and the occupation number is of order unity. 

• hv kT: the +1 in the denominator is negligible and h — exp (—hv/kT) and becomes 
exponentially small. 


• The occupation number n derived here is the equilibrium distribution for massless bosonic 
particles. 
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Figure 5.3: Left panel shows the occupations number for the Planck function. Right panel shows 
the brightness as a function of frequency for various temperature black-bodies. 


5.2.4 Specific Energy Density and Brightness 

Multiplying the number of states dN = (2v 2 /c 3 )dV dv dO by the mean energy for a state (5.231 
yields 


dE = u,y(Li)dV dv dfl = (2 v 2 /c 3 ) ^ hv - -d,V dv dfl 

(5.25) 

so the specific energy density is 


2 h v 3 /c 3 

u„(n) - j 

(5.26) 

and the brightness B y = I v = cu v is 


D 2 hv 3 /c 2 

v ^ ’ 

(5.27) 


The Planck function grows as B oc v 2 for hv <C kT and then falls rather abruptly due to the 
exponential for hv ^ kT (see figure |5.3| ). 

This is consistent with the u oc T 4 behavior deduced from thermodynamic considerations. The 
total energy density is u = 4ir f dv B^/c, this integral is dominated by modes around the char¬ 
acteristic frequency v* = kT / h, and the value of the integrand at the peak is on the order of 
Umax ~ hv 3 /c 2 ~ (kT) 3 /(h 2 c 2 ) so the integral is on the order of u ~ v*B meiX /c ~ (fcT) 4 / (h 3 c 3 ), 
which is proportional to T 4 . 

Note that the energy density can also be written as u ~ kT /A 3 . Equivalently, the number density 
of photons with energy hv ~ kT is on the order of n ~ 1/A 3 . 


5.3 Properties of the Planck Spectrum 


5.3.1 Raleigh-Jeans Law 

As discussed, for hv <C kT the argument of the exponential is small, and expanding gives mean 
energy per state E — kT, so we can say that these modes are ‘in equipartition’. The intensity is 


B 


RJ 

V 


2 v 2 kT/c 2 


(5.28) 
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5.3.2 Wien Law 

In the other extreme his kT we have 

B Wien _ ‘ 2 h^ e -hu/kT (5.2 9 ) 

c 1 

which falls sharply with increasing frequency. 

5.3.3 Monotonicity with Temperature 

From ( |5.27| ) we find dB u (T)/dT > 0 for all is. Thus B V {T' > T) lies everywhere above Bu(T). 

5.3.4 Wien Displacement Law 

The brightness B v peaks at hv ma x = 2.82 kT so i/ ma x = (5.88 x 10 lo Hz )kT/h. In wavelength, B\ 
peaks at A max T = 0.29cm K. 


5.3.5 Radiation Constants 


The constants occurring in the various versions of the Stefan-Boltzmann law can be computed in 
terms of the ‘fundamental’ constants h, k , c. The total brightness is 


B = 


dv B V (T) 



(5.30) 


The value of the dimensionless integral here is 

B = 


and hence 

27r 5 fc 4 
15c 2 h 3 


7t 4 /15 so 



27T 4 fc 4 4 

15c 2 h 3 


(5.31) 

and a = 

8t r 5 fc 4 

(5.32) 

15c 3 h 3 


5.4 Characteristic Temperatures 


5.4.1 Brightness Temperature 


If we observe a brightness I v at some frequency v then we can define a ‘brightness temperature’ T b 
such that I v = B v (T b ). 

This is simply computed in the RJ regime: 


T b 



(5.33) 


This requires that the source be resolved in order for I v to be defined, and will give a reliable 
estimate of the temperature if the source is optically thick. 


5.4.2 Color Temperature 

If we can deduce the characteristic frequency is*, say from broad band observations at a range of 
frequencies, then we can define the ‘color temperature’ 

Fedor = hv*/k. (5.34) 

The color temperature does not require the source to be resolved or optically thick. 










78 


CHAPTER 5. THERMAL RADIATION 


5.4.3 Effective Temperature 

If a bolometer provides the total flux density F but does not provide any detailed frequency distri¬ 
bution information one can deduce the temperature if the size of the source dLl is known by equating 


R _ F _ « C T 4 


This requires that the source be resolved. 


(5.35) 


5.5 Bose-Einstein Distribution 

The Planck spectrum derived above is the fully equilibrated distribution function for photons, and 
in the derivation we placed no restrictions on the occupation numbers, implicitly assuming that 
reactions which create or destroy photons are efficient. Under certain circumstances (e.g. electron 
scattering) such reactions may be inefficient, though scattering may be able to efficiently redistribute 
energy among a fixed number of photons. In this case, the photon energy distribution will deviate 
from the Planck spectrum, and one has the more general Bose-Einstein distribution, with mean 
occupation number 

e (hv+p)/kT _ ]_ v ' 

The constant p here is known as the chemical potential. 

If the chemical potential is positive then this leads to a finite occupation number at zero energy. 
Such a situation arises if there are too few photons for the given energy (i.e. fewer than for a Planck 
spectrum of a given energy). 

If the chemical potential is negative then the Bose-Einstein occupation number becomes infinite 
at a finite frequency hv = p, which is unphysical. Now one can vary the chemical potential by 
supplying or removing heat (without allowing photon number changing reactions). If one were 
to start with a photon gas with positive p then one can decrease it by extracting heat from the 
gas. Once p reaches zero, any further extraction of heat will result in a Bose condensation with a 
Planckian distribution for E > 0 and any excess photons in the zero energy state. 

We will see how the Bose-Einstein distribution and its analog for fermions - the Fermi-Dirac 
distribution arise when we discuss kinetic theory in chapter |19| 


5.6 Problems 

5.6.1 Thermodynamics of Black-Body Radiation 

The entropy 5 of a system is defined by dQ = TdS where dQ is the heat input and T is the 
temperature. 

a. Derive an expression for the entropy of black body radiation in an enclosure in terms of the 
temperature T, the volume V, and the radiation constant a. 

b. If one expands the enclosure adiabatically (i.e. with no heat input or output) how does the 
temperature scale with the volume. 

5.6.2 Black body radiation and adiabatic invariance 

a) Consider a perfectly reflecting cavity containing black body radiation. By invoking ‘adiabatic 
invariance’ (i.e. constancy of occupation number for each fundamental mode of oscillation) show 
that if the cavity (a balloon perhaps) expands isotropically, then the radiation maintains a black 
body spectrum. How does the temperature scale with the linear size of the cavity? 

b) Assume that the 3K background radiation has an energy density smaller than the current 
matter density by a factor 10 3 . What was the temperature of the background radiation at the epoch 
when the matter and radiation had equal energy density? 



5.6. PROBLEMS 


79 


c) Now consider a perfectly reflecting cavity which expands only in one direction (a cylinder 
capped by a piston for instance) and which initially contains black-body radiation. Again applying 
adiabatic invariance, compute by what factor the energy in the radiation decreases in the limit of a 
very large expansion factor. You should not assume that the cavity contains any matter to scatter 
the radiation and maintain isotropy and/or thermal equilibrium. 

d) Apply the same principle of constancy of occupation number to another simple harmonic 
oscillator: a simple pendulum undergoing small oscillations but with a slowly varying pendulum 
length L. How does the energy of the oscillator depend on frequency? How does the amplitude of 
the oscillations depend on frequency? 

5.6.3 Black-body radiation 

Derive the Planck spectrum for a thermal black-body radiation field. You should proceed in two 
steps: 

First compute the density of states in frequency space for a cubicle cavity of side L (don’t forget 
to allow for the two independent polarisation states). Obtain an expression for p , the number of 
states per unit volume per unit frequency per unit solid angle. 

Next calculate the mean energy per state as follows. In thermal equilibrium at temperature T, 
the probability distribution for the energy E = nhv follows the Boltzmann law p(E) oc exp(— /3E) 
where (3 = 1/kT and k is Boltzmann’s constant. Thereby show that the mean energy of a mode is 
given by E = — d/d/3 ln(X)^L 0 e~^ nhl '). Evaluate the sum to obtain a closed expression for E and 
combine with your expression for the density of states to obtain an expression for the specific energy 
density u v (Ll). 

Integrate it„(fi) to obtain the Stefan-Boltzmann law for the total energy density: u = aT 4 and 
express a in terms of fundamental constants (you may use the result for the dimensionless integral 
f dx x 3 /{e x — 1) = 7t 4 /15). 

5.6.4 Planck Spectrum 

A cubical cavity of side lm contains radiation at temperature T = 300K. Give a rough estimate of 
the characteristic wavelength of the radiation and estimate of the number of photons in the enclosure. 
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Chapter 6 

Radiative Transfer 


We now consider radiation passing through matter, which may absorb, emit and/or scatter radiation 
out of or into the beam. We will derive the equation governing the evolution of the intensity. Real 
scattering processes (e.g. electron scattering) are generally anisotropic and introduce polarization. 
Here we will consider only isotropic scatterers and unpolarized radiation. We will also limit attention 
to ‘elastic’ scattering in which there is negligible change in the photon energy in scattering (this is a 
reasonable approximation if the photon energy is much less than the rest mass of the scatterer and 
the latter has thermal or random velocity much less than the speed of light). 


6.1 Emission 


We define the (spontaneous) emission coefficient such that matter in a volume element dV adds to 
the radiation field an amount of energy 

dE = dV dQ. dt dv. (6.1) 


For isotropically emitting particles 


Jv — 


Pu 

47T 


( 6 . 2 ) 


where P v here is the radiated power per unit volume per unit frequency. 

The (angle averaged) emissivity e „ is the energy input per unit mass per unit time per unit 
frequency and is defined such that 


dE = e„ p dV dt dv (dfl/47r) (6.3) 

with p the mass density, from which follows the relation 

i- = - (6 - 4) 

Consider a cylindrical tube of cross-section area cL4 and length dl. The radiation energy in a 
co-axial cone of direction dfl is 

E = u v dA dl dfl dv (6-5) 

in a time dt = dl /c this tube will have moved thought its length, and the matter in the volume will 
have injected an extra amount of energy 

SE = jvdAdldttdvdt (6.6) 

so u'„ = (E + 8E)/dAdldSldv = u„ + j u dt, or du v = j„dt or, since /„ = cu u 

dl v = j u ds (6.7) 

where ds is an infinitesimal path length element. 
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6.2 Absorption 

Absorption will remove from a beam an amount of intensity proportional to the incident intensity 
and proportional to the path length: 


= —a v Rds ( 6 - 8 ) 

where the absorption coefficient a has units of (length)” 1 . 

For a simple model of randomly placed absorbing spheres with cross-section cr and number density 
n the mean covering fraction for objects in a tube of area A and length ds is 5A/A = nodV/A = nodi 
so we would expect an attenuation in intensity 

dl = —nolds (6-9) 

so, for this model, a = no. 

One can also define the opacity 

ol v = pn v (6.10) 

which is more convenient if one wishes to calculate the attenuation for propagation through a given 
column density of material. 


6.3 The Equation of Radiative Transfer 


Combining emission and absorption terms give the equation of radiative transfer: 

= -a v I„ + j v (6-11) 

ds 

This is easy to solve if j v is given, but tricky in general since j„ contains scattered radiation which 
is proportion to the angle averaged intensity, so we obtain an integro-differential equation. 
Solutions can easily be found for two special cases: 


• Emission only: 


• Absorption only: 


dl v 

ds 

dh 

ds 


j v —* I v {s) = 4(0) + J ds j v 

(6.12) 

-a v I v (») = I„(0)e-f daa " 

(6.13) 


It is useful to define an alternative dimensionless parameterization of path length called the 
optical depth t such that 

dr v = civds (6.14) 

and in terms of which the pure absorption solution is I t ,(s) = 4(0)e” T . 

With both emission and absorption it is useful to define the source function 


Sis jis / Otjy 


in terms of which the RTE becomes 


and the formal solution of the RTE is 


dl v 

dr„ 


— —Iis + S v 


4(r) = 4(0)e" T + J e-( T ” T ')^(r')dr' 
0 


(6.15) 

(6.16) 


(6.17) 


which can be verified by direct differentiation. The utility of this solution is limited by the fact that 
the source function is not given, but must usually be determined as an integral over the intensity. 
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6.4 Kirchoff’s Law 

Consider a kiln containing some matter and radiation in thermal equilibrium at temperature T. The 
intensity obeys the RTE: 

d ± = S v -I v (6.18) 

dr 

but in equilibrium the intensity must be spatially constant, so dl/dr = 0, and hence the S„ = I „ 
= B V [T ), ie the source function and intensity are equal. Since the source function is defined as 
S = j / a , this means that 

jv = a u B v (T) (6.19) 

which is Kirchoff’s law. This tells us, for instance, that if some material absorbs well at some 
particular wavelength (perhaps because of a resonance) then it will also radiate well at the same 
frequency. 


6.5 Mean Free Path 

We can compute the probability distribution for photon paths and the mean free path as follows. 
The probability that a photon is absorbed within an infinitesimal distance ds is 

P ab s = ads (6.20) 

so the probability it survives is 

^survive (d-S) — 1 ads (6.21) 

and the probability it survives at least a finite distance s (i.e. it survives a succession of N = s/ds 
steps of length ds) is 

Psurvive(> s) = (1 — ads) N = (1 - as/N) N = exp(-as) (6.22) 

This is the cumulative distribution function. The differential distribution function for path 
lengths is 

P{s) = - dPsmv ‘ ve( > S) = a exp (—as) (6.23) 

ds 

so the distribution of path lengths is exponential with typical path length ~ 1/a as might have been 
expected. 

The mean path length is 

(s) = f ds sP(s) = 1/a. (6.24) 


6.6 Radiation Force 

Radiation incident on scattering/absorbing material will result in transfer of momentum to the mat¬ 
ter. We now calculate this assuming that the scattered or re-radiated energy is emitted isotropically 
in the rest frame of the matter. Consider the radiation in a narrow cone of direction dfl and in a 
co-axial cylinder. The energy in the cylinder is 

E = u dl dA dfl. (6.25) 

In a time dt = dl/c this tube will travel its length, and absorbers and/or scatterers will have reduced 
the energy content by an amount 

SE = ( adl)E (6.26) 

with a corresponding transfer of momentum 

<5p = hadlE/c = n a u dt dV dfl = n a - dt dV dfl 


(6.27) 
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so 


momentum force 
volume x time volume 


where F„ = f dQ n/„ is the radiation flux vector. 
The acceleration of the material is 


1 

c 


du a„F„ 


force . 1 

-= acceleration = - 

mass c 


dv k v F v 


Note that these results assume that the scattering process is front-back symmetric. 


(6.28) 


(6.29) 


6.7 Random Walks 

As discussed above, the RTE with scattering is an integro-differential equation and it is therefore 
quite hard to find exact solutions for most problems of interest. 

However, order-of-magnitude estimates of some important features can be obtained using random 
walk arguments. The basic idea here is that if the typical distance between scatterings is l say (ie the 
mean free path) and if each scattering gives a random change in direction then, after N scatterings, 
a photon will have traversed a net distance on the order of L* = y/Nl. 

A simple way to derive this famous ‘drunkards walk’ law is to consider a simple 1-dimensional 
walk, where at each step the particle can move forward and backward one unit. Let a particle have 
reached position x n after n steps. After the next step it will be at x n +i = x n + 1 or x n +i = x n — 1 
with equal probability of 1/2. The mean square displacement after n + 1 steps given that the particle 
is at x n after n steps is 


{x 2 n+1 \x n ) = ^ \(x n + l) 2 + (x n - l) 2 ] = x 2 n + 1 (6.30) 

so the average increase of ( x 2 ) in one step is unity, regardless of the value of x n . More formally, the 
unconstrained mean square of x„+i is given by integrating over all possible values for x n : 

( x n+i) = J dx n p{x n ){x 2 n+1 \x n )= J dx n p(x n ){x 2 n + T) = {x 2 n )+ 1 (6.31) 

so the mean square displacement increases by unity for each step and since the particle starts at 
io = 0 we have (a; 2 ) = n. 

Alternatively, and in 3-dimensions, one can write the net vector displacement R as 


R = ri + r 2 + ... + r N 


The mean vector displacement vanishes, 


(R) = N(n) = 0 


(6.32) 


(6.33) 


but the mean square displacement is 

(R 2 ) = (r 2 ) + 2(n • r 2 > + ... + (r 2 ) + ... (6.34) 

now all the cross terms like (iq -r 2 ) vanish (since the directions of different path segments are assumed 
to be uncorrelated), and we have 

(. R 2 ) = N(r 2 ). (6.35) 

As an example, consider the escape of a photon from a cloud of of size L. If the mean free path 
is l <C L then the optical depth of the cloud is r ~ L/l. In TV steps, the photon will travel a distance 
Z* ~ y/Nl. Equating this with the size of the cloud L yields the required number of steps for escape 
TV ~ r 2 . 
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6.8 Combined Scattering and Absorption 


Of some interest are media in which there is both absorption, characterized by the absorption 
coefficient ay, and scattering, which we shall characterize by the scattering coefficient ay, such the 
the mean free path for scattering alone is 1/oy. The distinction is that scattering can isotropize 
radiation, but cannot thermalize it, as this requires true absorption and re-emission. 

The mean free path is 

l v = (ay + ay) -1 . (6.36) 

The probability that a path ends with an absorption event is called the single scattering albedo 


tv 


OLi/ 

Oil/ H - oi/ 


(6.37) 


The probability that a photon gets absorbed after N paths is then P(N) ~ eN which approaches 
unity after N ~ e~ l paths, or equivalently after traveling a net distance 

h~l/\/e - / . 1 = • (6-38) 

V Qy(oy T ay) 


This is called the thermalization length or the effective mean free path. One can define the effective 
optical depth for a cloud of size R as 


R /- 7 - 7 

= T = V T a{T a + T a ) 


(6.39) 


with r a = aR and t„ = oR. 

• If r* <C 1 then the cloud is ‘effectively thin’ and thermal photons emitted by the absorbing 
particles will typically escape from the cloud without being re-absorbed. The emissivity (due 
exclusively to the absorbing particles) is j = aB and the luminosity is L ~ jV ~ aBV. The 
brightness is I ~ L/A ~ aBL ~ er+B. 

• If r 1 the cloud is ‘effectively thick’ and one expects the radiation within the cloud to 
be thermalized. In this case the photons which escape will typically have been emitted by 
absorbers within a distance ~ Z* of the surface, so the luminosity of the cloud will be L ~ 
Ajl„ ~ aBAR ~ y/eBA. This luminosity will be much less than that of a black-body radiator 
of the same area if e <C 1. 

6.9 Rosseland Approximation 

The Rosseland approximation allows one to compute the transport of energy in a cloud with scat¬ 
tering and absorption. 

We can write the RTE as 


-r = ^ B ») - - Jv) (6.40) 

as 

since the absorbing particles emit with emissivity j v = a„B n u whereas the radiation scattered is 
proportional to the mean intensity J„. 

This can also be written as 


dl„ 

ds 


{oi v -f <j v )(I v SR) 


where the source function is here defined as 


(6.41) 


ol u B v + (Ji/Ji/ 


S l 


Oi v + (J, 


(6.42) 
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which, in the isotropic scattering model, is independent of direction. 

If we assume a plane-parallel or stratified system dz = /.ids with /r = cos 0 and we have 

8T 

= ~{a v + a v ){I v - S v ) (6.43) 

The Rosseland approximation scheme exploits the fact that deep within a cloud, the radiation 
will be close to isotropic, and the fractional change in the intensity over one mean free path will be 
small, or equivalently, that dl/dz <C (a + a)I. 

To zeroth order we ignore dl^/dz entirely and readily obtain the zeroth order solution 

4 0) = B v . (6.44) 


At next higher order 


r (l) _ c_ M dB v 

-L1/ r\ 

£*„ + OV OZ 


(6.45) 


Taking the first moment of this, the source function (which is isotropic) drops out, and we obtain 
the first order flux 


F u (z) = 2ir j dn (il£\z,ii) = - 


The integrated flux is 


47 xdTf 

F(z) = —— — J dv (ay + oy) 


4tt dB„ (T) dT 
3(oy + oy) dT dz 

—! dB v (T) 


dT 


or 


F(z) = - 


16(7sbT 3 dT 


3 an dz 

where the Rosseland absorption coefficient an is defined as 


1 _ f , , , wi dB v {T) f 

— = Jd»<a„ + v„) -gj —/Jdv 


dB u (T) 

dT 


(6.46) 

(6.47) 

(6.48) 

(6.49) 


and cr sb is the Stefan-Boltzmann constant, and where we have made use of f dvdB v /dT = dB(T)/dT 
AaT 3 / 7T. 


Equation (6.481 shows the heat flux to be proportional to the temperature gradient, as might 
have been expected, or more generally F oc VT. 


• The thermal conductivity is lQas^T 3 /2>otR 

• is largest at frequencies where the combined absorption + scattering coefficient a u + cr„ 
is small. 


• The Rosseland approximation requires both nearly isotropic conditions and the stronger con¬ 
dition that the medium should be nearly thermalized. 

• It tells us how energy seeps upwards through a scattering and absorbing medium, but it does 
not describe the manner in which the intensity /„ —» B u within such a medium. 


One can understand the general form of (6.481 from a simple order-of-magnitude argument. 
Consider two adjacent slabs of material of thickness Nz ~ l, the mean-free-path, and with the upper 
and lower slabs having temperatures T and T + AT respectively. The energy within the upper slab 
is E = aAAzT 4 , while that in the lower is E' = aAAz(T + AT) 4 ~ E + AaAAzT 3 AT. In a time 
dt ~ dl/c ~ Az/c, a substantial fraction of the photons originally in the lower slab will now be in 
the upper slab and vice versa , with a corresponding transfer of energy per unit area per time of 


F 


E'-E 

Adt 


, „ AT 
alT 3 c —— 
Az 


I&SbT ~r— 

Az 


AT <j S bT 3 dT 


dz 


(6.50) 


in agreement with (6.481. 
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6.10 The Eddington Approximation 


The Eddington approximation also assumes that conditions are close to isotropic, but not necessarily 
that the radiation is locally nearly thermalized. 

Eddington expands the intensity as a function of angle p = cost): 


Iu(t, p) = a„(r) + b v (r)p + ... 

and ignores all but the first two terms. 

Taking the first three moments of the intensity gives 

• iu — 2 S 
H v — Q S 

K v = \ J dp p 2 I v = a„/3 


(6.51) 


(6.52) 


These are proportional to the mean intensity, the energy flux, and the radiation pressure respectively. 
The first and last together imply the relation K v — J u /2>. 

Write the RTE as 

BT 

p^ = -{I„-S v ) 

OT,. 


with 


and 


dr„ — -(«„ + o v )dz 




oz-i/By -V (7i/Ji/ 


- Ez/A?!/ + (1 — tv)Jv 


and take the first few moments. 
The zeroth moment yields 


and the first moment is 


dH v 

Bt 


= J„~S U 


BK V 1 B.L 


Bt 3 Bt 

and combining these and eliminating H u yields 

ia 2 j, 


= H V 


3 Bt 2 


- j v BlJ J V Cl/By -(- (1 E^) J V 


or, more simply, 


1 B 2 J V 


3 Bt 2 


= e y ( - B v ). 


(6.53) 

(6.54) 

(6.55) 

(6.56) 

(6.57) 

(6.58) 

(6.59) 


Given some temperature profile T(z), and hence B v (z), equation (6.591 allows one to solve for 
J v and hence the source function S „ and from this one can obtain I u from the formal solution of the 
RTE. 


Equation (6.59) has the form of a diffusion equation and describes how the intensity relaxes to¬ 
wards the black-body from within a cloud. For example, if we assume the temperature, and therefore 
B : to be nearly independent of position then the general solution of (6.591 contains exponentially 
growing and decaying terms J — B ~ exp(±r*) with r* = \Z3er the effective optical depth. If we 
require that J B deep within the cloud, this boundary condition kills the exponentially growing 
term, and we find that the radiation relaxes towards black-body exponentially with a scale length 
equal to to the effective mean free path, in agreement with the order-of-magnitude result above. 

See R+L for further discussion of this and the ‘two-stream’ approximation. 
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Figure 6.1: Processes considered in the Einsten ‘two-state atom’ model. The filled circle indicates 
the internal energy level of the atom. Lower/upper level is the ground/excited state. Top row is 
spontaneous emission. Middle panel is absorption. Bottom panel is stimulated emission, where the 
atom is ‘encouraged’ to de-excite as a result of the ambient radiation. 

6.11 Einstein A, B Coefficients 

Einstein considered a idealized two-state system with energy levels E\, E 2 in equilibrium with a 
thermal radiation bath with which it can exchange photons of energy hv = E 2 — E\. He deduced 
two important facts: 

• In addition to the processes of absorption to excite the system and spontaneous emission to 
de-excite it there must be another process ‘stimulated emission’ in which the system de-excites 
stimulated by the ambient radiation. 

• The ratio of the rates for these processes are universal and given in terms of fundamental 
constants. 

6.11.1 Einstein Relations 

The reactions we consider are tabulated in figure |6.1| These are 

• Let the rate for spontaneous emission be A 2 \, which is the probability that the system drops 
from E 2 to Ei per unit time. 

• Let B 12 J v be the probability per unit time that the system be excited from E\ to E 2 , in the 
ambient mean intensity J v . 

• Let B 2 iJ u be the probability per unit time that the system be de-excited from E\ to E 2 , as a 
result of the ambient mean intensity J„. This is referred to as stimulated emission. 

For an ensemble of such systems, there will in equilibrium be number densities ni, n 2 which must 
satisfy 

n\Bi 2 J = n 2 A 2 i + n 2 B 2 iJ (6.60) 
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from which we can solve for J 


J A 21 /B 21 

{n\/n 2 ){B 12 /B 2 \) - 1 

(6.61) 

However, in thermal equilibrium, the ratio of occupation numbers must satisfy the Boltzmann law 

n l _ g A E/kT _ e hv/kT 

(6.62) 

n 2 

and therefore we have 


A 21 /B 21 

~ {B l 2 /B 21 )e h OkT _ 1 

(6.63) 

Comparing this with the Planck function we find that these are consistent, 
scopic rates satisfy the ‘Einstein relations’ 

provided the micro- 

B 21 = B \2 

(6.64) 

and 


A 21 = 2his 3 c 2 B 2 i. 

(6.65) 


• These relations are somewhat reminiscent of Kirchoff’s law, which relates emission and absorp¬ 
tion for matter in thermal equilibrium. This is superficial. The Einstein relations (also known 
as ‘detailed balance’ relations) are more profound as they relate the microscopic absorption and 
emission rates for any system, regardless of whether it happens to be in thermal equilibrium. 

• The discussion here is somewhat oversimplified. See RL for inclusion of ‘statistical weights’. 


6.11.2 Emission and Absorption Coefficients 


We can derive the absorption and emission coefficients in terms of the A, B coefficients. 
The spontaneous emission is 

dE = ri 2 A 2 ihh'—dV(t>(v)dvdt = j„ dll dv dV dt 

47T 

where we have introduced a narrow, but finite, ‘line profile’ Thus 

jv = j^n 2 A2i(t>{v). 


( 6 . 66 ) 


(6.67) 


The (uncorrected) absorption is 


ot„ = ^niB 12 <t>{v) 
47T 


( 6 . 68 ) 


and allowing for stimulated emission (which is most naturally considered negative absorption) 


hv 

a v = — Bi 2 (p{v){ni - n 2 ) 
47T 


The source function is then 


Sy — j v /a n u 


U 2 A 2 i/B12 


ni — n 2 

• If the matter is in thermal equilibrium, this results gives the absorption coefficient as 

a, = ^ ni B 12 (l - e~ hv ' kT )^{v) 

47r 

and source function S v = B u , in accord with Kirchoff’s law. 


(6.69) 

(6.70) 

(6.71) 


• For general (ie non-thermal) occupation numbers, this formula gives emission for given m, n 2 . 

• If the occupation numbers are inverted , so n 2 > ni then the absorption becomes negative. 
This leads to an instability with the matter dumping it’s energy into the radiation field giving 
rise to huge photon occupation numbers. This is the process underlying lasers and masers. 
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6.12 Problems 


6.12.1 Main sequence 

Consider the optically thick interior of a star. 

a) Using ‘random walk’ arguments (or otherwise) show that the the radiative flux (energy/area/time) 
is on the order of 

F ~ acT 4 X/R (6.72) 

where A is the mean free path (MFP), R is the radius and a is the Stefan-Boltzmann constant. 

b) How is the MFP related to the density of particles n and their scattering cross section cr? 

c) Now use the equation of hydrostatic equilibrium to show that 


kT GM 
to R 


(6.73) 


where M is the mass interior to R and in is the mean molecular weight (assume that 
pressure radiation pressure). 

d) Combine the above results to show that the luminosity net luminosity scales 
the mass, and more specifically, 


L 


ac 

cr 



mM 3 . 


the gas kinetic 
as the cube of 

(6.74) 


e) Assuming electron scattering dominates, so the appropriate cross-section is the Thomson cross 
section a ~ 6 x 10~ 25 cm, estimate the luminosity of a star of mass 2 x 10 33 gm. How does this compare 
to the luminosity of the sun? 


6.12.2 Radiative Transfer. 

A large homogeneous sphere of material of uniform density and temperature T has a scattering 
coefficient cr = 1 x 10 _2 cm _1 and an absorption coefficient a = 1 x 10 _4 cm _1 (both assumed to be 
independent of frequency and depth). 

a. Use random walk arguments to estimate the depth below the surface at which the radiation 
approaches a thermal ‘black-body’ spectrum. 

b. How does the luminosity of the sphere compare to that of a black body of the same size and 
temperature. 


6.12.3 Eddington luminosity 

a) Show that the condition that an optically thin cloud can be ejected by radiation pressure from a 
nearby luminous object is that the mass-to-light ratio should be 


M K 
L IttGc 


(6.75) 


where k is the mass absorption coefficient for the cloud (assumed independent of frequency), 

b) Show that the terminal velocity of the cloud, if it starts from rest at a distance R is 


v = 


I2GM 

R 


kL 


HtGMc 


- 1 


(6.76) 


c) Taking the minimum value of n to be that due to Thomson scattering when the cloud is fully 
ionised show that the maximum luminosity the object can have and not eject hydrogen by radiation 
pressure is 


iTiGcrn h AI 
Uedd = - 

(Jj 1 


1.25 x 10 38 ergs _1 (M/M 0 ) 


(6.77) 



6.12. PROBLEMS 


91 


6.12.4 Poissonian statistics 

3. Consider a “Poissonian” or “shot noise” bus service with buses arriving at your stop randomly 
at the rate of n per unit time (i.e. the probability that a bus shall arrive in a short time interval 5t 
is nSt). 

a) Assuming you have just arrived at the bus stop, what is the probability distribution for the 
time you will have to wait until the next bus arrives? 

b) What is the probability distribution for intervals between bus arrivals? 
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Chapter 7 


Radiation Fields 


7.1 Lorentz Force Law 

In the non-relativistic limit the force on a particle of charge q is 

F = q(E + v x B/c) (7.1) 

so the rate of work is v ■ F = qv • E (the magnetic field forces particles in a direction perpendicular 
to their motion so therefore does no work), and therefore d(mv 2 /2)/ dt = qv • E. 

For a distribution of charges with charge density p=Y ] ^/AF and current density j = )T qv/AV 
the force per unit volume is 

f = pE + -j x B (7.2) 

c 

and the rate of work per unit volume is j • E and is equal to the rate of increase of the mechanical 
energy density. 


7.2 Field Energy Density 

The electromagnetic field has energy which is quadratic in the fields E and B. A simple way to 
compute the electric field energy is to consider a sphere of radius R carrying total charge Q uniformly 
distributed in a very thin shell. The electric field ramps up from zero to Eg = Q/R 2 as one passes 
from the inner to the outer edge of the shell, so the mean electric field felt by the charges is Eq/2. 
If we force the sphere to contract from R to R — A R then we must do work against the repulsive 
electrostatic field 

AW = Q(E 0 /2)AR = \e 2 R 2 AR = ^-E 2 AV (7.3) 

Z o7T 

with AV = WR 2 AR the volume swept out. Since in the process we have created field of strength 
Eo in this volume, we can identify E 2 /(8tt) with the energy density of the field. 

Similar arguments can be applied to show that the magnetic field energy density is B 2 /(8tt). 

7.3 Maxwell’s Equations 

Maxwell’s equations are 


Ml : V • E = 4 t rp M2: V ■ B = 0 

M3: VxE=-iff M4 : VxB = ^j+±ff 


which are respectively the differential statements of Ml: Gauss’ law; M2: the absence of free magnetic 
charges; M3: Faraday’s or Lenz’s law of inductions; and M4: Ampere’s law (plus the displacement 
current term) (see figure 7.11. 
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Gauss’ law 
V*E = 47tp 



Faraday’s law 
of induction 



No magnetic monopoles 
V-B = 0 




Figure 7.1: Maxwell’s equations are, for the most part, the differential statement of the laws of 
electro- and magneto-statics, plus the law of induction. Maxwell’s major contribution was to see 
that Ampere’s law for steady currents V x B = 4-7rj needed to be augmented by the ‘displacement 
current’ term. In the lower right panel, for the loop on the left is is clear that we can identify the 
loop integral of B with the current piercing the loop. For the right hand loop we can either close 
the surface so that has a current through it, or, by taking the surface to fall between the capacitor 
plates it will have no current, but it will have a net dE/dt. Maxwell’s displacement current takes 
care of the latter possibility, and guarantees consistent results. 


Taking the divergence of M 4 and using Ml yields 

f + V.j = 0 (7.5) 

which expresses the conservation of charge (to see this integrate over some volume and invoke the 
divergence theorem). 

Dotting Ampere’s law with the electric field E gives 


J-E 


1 

47T 


c(V x B) • E — E • —— 
v ’ dt 


(7.6) 


using the result for vector calculus that V • 
(V x B) • E by B • (V x E) - V • (E x B) 
yields 

1 d(E 2 


(A x B) = B • (V x A) - A 
and finally replacing V 


(V x B) we can replace 
by Lenz’s law 


x E with — - 

C Ot 


E- 


B 2 ) 


8tt 


dt 


= - —V 

47T 


(E x B) 


(7.7) 


Now the first term on the left hand side is the rate of change of the mechanical energy of the charges. 
The second term is the change of the field energy density, so the left hand side terms together are 
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the rate of change of the total energy density. This equation expresses conservation of energy, with 
the right hand side being the divergence of a field energy flux vector, the Poynting vector: 

S = — E x B (7.8) 

47T 


The Poynting flux has some peculiar features. For example, it appears to say that a charged 
bar magnet has an energy flux that circulates around the bar in a toroidal sense. This is not very 
meaningful. However, if one integrates the terms in the above equation over some finite volume, 
the terms on the left give the rate of change of the total energy within the volume, and the right 
hand side can be converted to an integral over the surface the volume element of the Poynting flux, 
and this is well defined and free from peculiarities. Note that static fields tend to fall off as 1/r 2 
so S ~ 1/r 4 for such fields and the integral over the surface for a static system is f dA ■ S ~ 1/r 2 
which converges to zero as it should. 


7.4 Electromagnetic Waves 

Specializing to empty space, with p = 0, j = 0, Maxwell’s equations become 

Ml : V•E = 0 M2 : V • B = 0 , . 

M3: VxE = -if5 M4: VxB=if£ ^ 7 ' 9 ^ 

c ot c ot 

Taking the curl of M3 and using M4 gives V x (V xE) = — c~ 2 d 2 E/dt 2 and invoking the vector 
identity V x (V x E) = V • (V • E) — V 2 E and Gauss’ law V • E = 0 yields 

= » (7.10) 

which has the form of a wave equation. Performing the same sequence of operations with Ml <-> M2, 
M3 <-> M4 yields an identical wave equation for B. 

Let us look for traveling wave solutions of the form 

E(r,i)=eF 0 e‘( k "-“ f ) 

B(r,t) =bB 0 e i ^ r - a,t '> 1 ’ 

with unit vectors |e| = |bj = 1 and with complex amplitudes E$, Bq, though with the understanding 
that the physical field is the real part. Inserting these trial solutions in Maxwell’s equations yields 
the following inter-relations: 


Ml : 

k 

e = 0 

M2 : 

k 

b = 0 

M3 : 

k x eE 0 

= (uj/c)bB 0 

M4 : 

k x bB 0 

= — (w/ c)eE 0 


The first two of these tell us that the waves are transverse: both b and e must be perpendicular to 
the wave vector k. The second pair tell us that e and b must be orthogonal to each other, so k, e, 
and b form an orthonormal triad. Inspecting the magnitude of the latter pair gives Eq = Boto/kc 
and Bq = Eoco/kc which imply 

E 0 = B 0 (7.13) 


and also yield the dispersion relation 


u> = ck 


(7.14) 


from which we can infer that 


. u> . du> 

phase velocity = — = group velocity = — = c 


(7.15) 
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so these waves are non-dispersive. 

The energy density is [/field = {E 2 + B 2 )/8tt. Writing E 0 — \E 0 \e l<l> we have Re(E) 2 = Re(i?) 2 = 
|-Eo| 2 cos 2 (k • r — cot + </>). The cosine-squared term averages to 1/2 so we have 

(U) = ±\E 2 \ = ±\B 2 \. (7.16) 

The Poynting vector is 

S = -^—E 0 B 0 e x b = -^-kl-Eol 2 cos 2 (k ■ r — wt + </>) (7-17) 

47T 47T 

which points along the wave vector and has expectation value 


(S) = ^-\E 2 \=c(U iield ). 


(7.18) 


7.5 Radiation Power Spectrum 


If we sit in a collimated beam of radiation, the Poynting vector tells us the energy flux per unit time 
per unit area 


dW 

dAdt 



(7.19) 


so the average energy flux per unit area is 


T 



o 


(7.20) 


or equivalently, by Parseval’s theorem, 

/ dW \_ c r du(\E(u>)\ 2 ) 
\ dAdt / 4-7T J 2tt T 


(7.21) 


with E(u) the Fourier transform of E(t). The squared transform of the electric field is called the 
power spectrum , and is proportional to the intensity with v = u>/2tt. 

For some purposes this description is too simplistic, as it hides the vector nature of the field, 
which allows the possibility of polarized radiation, and for eg radiative transfer one needs a more 
detailed description which gives the radiation flux in the different polarization modes. 


7.5.1 Polarization of Planar Waves 

A linearly polarized plane wave may be written as 


E(r, t) = aRe 


£ 0 e i(k ' r_wt) 


(7.22) 


where E$ is a complex amplitude, and a is perpendicular to k. The most general plane wave (with 
k oc z) is 

E(t) = Re[(x£ x + y E y )e~ iuJt ] = Re[E 0 e- Iwt ] (7.23) 


where, for simplicity, we give the value of the field at z = 0. Both E x and E y are complex amplitudes, 
and Eo is a complex vector. Writing E x = and E y = £ y e zcPv , with £ x , £ y real, the physical 

field components are 


E x (t) 


O 

o 

w 

e 

i 

-6 

8 

Ey(t) 


£y COS (ujt — (fiy) 


(7.24) 


The 2-vector E(t) moves along a closed periodic curve in E x , E y space with period T = 

In fact, this curve is an ellipse. To see this, write down the parametric formula for an ellipse with 
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principle axes aligned with the coordinate axes in some rotated frame {E' X ,E'} = {acos(w(f — 
to)), &sin(a;(f — to))}- Then apply the rotation operator to obtain the formula in an arbitrary frame, 
rotated by some angle y: 


E x 


Pi 

Kl 



cos x — sin x 
sin y cos y 



' K - 


IK l 


(7.25) 


Expanding the cosine terms in (7.241 and comparing terms containing cos cut and sinwt in each of 

and ip y . 


R E 

7 J - / y 


yields 4 equations which one can solve to obtain a, b to and y for given £ x , £ x , ip x 
Special cases are a = 0 or b = 0 which result in a linearly polarized wave, in which the point E(t) 
oscillates back and forth along a line through the origin. For a = b the ellipse becomes a circle, and 
the wave is said to be circularly polarized with helicity which is negative or positive if, for a wave 
propagating towards the observer, the field value rotates clockwise or counter-clockwise respectively. 


7.5.2 Polarization of Quasi-Monochromatic Waves 

For a nearly monochromatic wave of angular frequency ccq (eg a general wave which has been passed 
through a narrow band filter) we know that each component of the electric field will be locally 
sinusoidal, but that the amplitude, or envelope of the wave will evolve slowly with time. Over any 
limited time l/u>o St -C 1 /Sui, with Sco the width of the filter, the wave will behave much like a 
pure planar wave discussed above, but over longer periods, the wave will evolve through a succession 
of different planar wave states. Now it may be that this succession of states has some persistent 
features; it may be that there is a general tendency for the wave to be found with E x > E y , in which 
case we would say the wave has some degree of linearly polarization, or it may be that the wave 
tends to be found rotating counter-clockwise, in which case we would say that it has some degree 
of circular polarization. On the other hand, it might be that there are no persistent correlations; 
the field might be found to be rotating clockwise at one moment and counter-clockwise at another 
time, for instance. If there is no systematic sense of rotation, and if there is no systematic tendency 
for the field values to prefer any particular direction in field space then we describe the wave as 
unpolarized, or ‘natural’. 

7.5.3 Power Spectrum Tensor and Stokes Parameters 

We can formalize this using the language of random fields; since, in general, the electric field for a 
collimated beam is a 2-vector valued random function of time, with transform 

Ei(u) = J dt Ei(t)e iut (7.26) 

so we can write down the expectation for the product of components of transforms much as for a 
scalar function: 

(^ME^u/)) = JdtJdt'(E i (t)E j (t'))e i M-"'*') 

= / dt / dr ( y (r)e“' T ’ 

where we have defined the auto-correlation function tensor 


^{T)={Ei{t + T)Ej{t)). (7.28) 

Recognizing the first integral in ( 7.27| ) as a representation of the Dirac (5-function we have 

(E;(w)E*(u/)} = 2it6(u; - u/)Py(a/) (7.29) 

where the power spectrum tensor is 


Pij{u) = 


dr 


(7.30) 
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This is very similar to the result for a scalar random field, but with one important distinction: the 
power here is generally complex. From the definition of Cj ( T ) (7.28) we have 

fe(r) = s c M-r) (7.31) 


so Ciji T ) is symmetric under the combination of reflection in time and exchanging i <-> j. For i = j 
this implies that both P xx (u) and P yy {oj) are real. For i ^ j though this implies P xy {cj) = P* x (oj). 
It also follows from the definition of Py that Pij(—u>) = P*j(uj). Because of these symmetries P z j(uj) 
has only 4 real degrees of freedom for each |w|. 

Now consider a filtered version of the field 

Ef(t) = j (7.32) 

for which the two point auto-correlation function tensor is 

(E({r)Ej{ 0)) = ^FHP(,')(£,( W )£*(,'))e-“ T (7.33) 


or, on invoking the definition of T\j (cc) 

(E[(t)E[( 0)> = j ^|PH| 2 P„-He- 1 ‘ 


(7.34) 


and, if F(w ) is a narrow band filter, accepting only frequencies in a narrow range about lo = ±wo> 
$(t) = (Ef(r)Ef( 0)) = P;> o)e~^ + P* K)e^P (7.35) 


which is manifestly real. 

The four real components of the tensor P. r} can be taken to be the Stokes parameters /, Q , U, 
V defined by 


T — P P 
1 — xx i x yy 

Q — Pvx Pyy 

77 — P _L p 
— J xy i 1 yx 

iV ~ P — P 

— 1 xy 1 yx 


(7.36) 


The first three of these can be expressed in terms of the zero-lag auto-correlation function tensor: 


i= W F X M + &)) 

Q=W x M-^ y m (7.37) 

U = 0 ) 


while the fourth involves the correlation at a lag of 1/4 of a wave period: 

V = C yx (r = 7r/2w 0 ) 


(7.38) 


• The Stokes parameters have the following significance: I gives the total energy flux, or intensity. 
Q and U describe the two states of linear polarization; for Q = J, U = V = 0 the field is fully 
linearly polarized with electric field parallel to the x-axis, for Q = — /, U = V = 0 the field 
is fully polarized along the y-axis. For U = ±7 (and Q = V = 0) the field is polarized along 
the diagonals. The circularity V describes circular polarization, with V = ±7 corresponding 
to complete circular polarization in the two helicity states. 

• One can show that in general 7 2 > Q 2 + U 2 + V 2 . For Q = U = V = 0 the beam is said to be 
unpolarized. 

• The linear polarization terms Q , U may be measured by passing the beam through a polarizing 
filter (e.g. a grid of wires for microwaves) and measuring the intensity as a function of angle. 
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Equation (7.381 suggests how to measure circular polarization; split a beam, introduce a 1/4 
wave lag in one arm, recombine and measure the intensity. 


• The Stokes parameters are additive for a superposition of independent beams. This is not the 
case if the beams are correlated; the superposition of two opposite helicity circularly polarized 
beams gives a resultant which is linearly polarized, for example. 


• The general radiative transfer equation can be stated much as for the simple unpolarized case, 
but involves the 4-vector {/, Q,U,V}, and the scattering coefficient becomes a tensor quantity. 
See Chandrasehkar’s “Radiative Transfer” for details. 


• A general beam can be decomposed into two components 


I 


i - VQ 2 + u 2 + v 2 


\JQ 2 + IP + V 2 

Q 

u 

= 

0 

0 

+ 

Q 

u 

V 


0 


V 


(7.39) 


ie as a superposition of a unpolarized beam and a pure elliptically polarized beam. 


• One can define the degree of polarization as 

v/Q 2 + U 2 + V 2 


(7.40) 


7.6 Problems 

7.6.1 Maxwell’s equations 

Write down Maxwell’s equations for the fields E, B produced by a distribution of charge with density 
p and current density j. 

Show that these equations imply charge conservation: 

| + V.j = 0 (7.41) 


Use the Lorentz force law to show that the rate per unit volume at which the fields do work on 
the charges is equal to j ■ E. Now use Maxwell’s equations to show that 


j -E + 


1 d(E 2 + B 2 ) 
8n dt 


-V.S 


(7.42) 


where S = cE x B/47T. Integrate this over some small volume, and, using the divergence theorem 
to convert the S integral to a surface integral, interpret the three terms in the resulting equation. 
(You may make use of the identity V • (E x B) = B • (V x E) — E • (V x B)) 

Assuming a trial solution of the wave-like form 


E = E 0 e i(k ' r -“' t) B = B 0 e^ k ' r -^ (7.43) 

use Maxwell’s equations to show that for plane waves propagating in vacuum the vectors Eo, Bo and 
the wave-vector k form a mutually orthogonal triad; that E 0 = B 0 ; and that the phase and group 
velocity for these waves are equal to c. Compute the mean energy density ( U) and the mean of the 
Poynting vector (S). 
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7.6.2 Energy and momentum of radiation field 

Consider a charge q moving in a viscous medium with a frictional force F v j sc = — (3v. Suppose a 
circularly polarised electromagnetic wave passes through the medium so the equation of motion of 
the charge is 

— F v j sc + FLorentz (7.44) 

now assume further that the charged particle is very light so that it’s motion is such that the viscous 
and Lorentz forces balance: F v j sc + FLorentz — 0. Let the electric held amplitude be E and the 
frequency of the wave be u>. 

a. Show that to lowest order in v/c (i.e. neglecting the magnetic contribution to FLorentz) the 
charge moves in a circle in a plane normal to the wave vector with speed qE/(3 and with radius 
qE/(3 (jJ. 

b. Show that the work done by the wave per unit time is q 2 E 2 //3. 

c. Now consider the magnetic force term. Show that this gives rise to a force parallel to the wave 
vector of amplitude q 2 E 2 /(3c. Thus show that in absorbing an amount of energy dW from the 
wave, the momentum transfer is dP = dW/c. 

d. Show that the torque exerted on the fluid is q 2 E 2 /(3u. What is the angular momentum 
transferred to the fluid in the course of absorbing an amount of energy dW7 

e. Show that the absorption cross section is Inq 2 / (3c. 

f. If we now regard the radiation to be composed of photons of energy E 7 = hu, show that 
the foregoing results require that momentum of a photon is P = Tuv/c = E 1 /c and that the 
angular momentum is J = h. 



Chapter 8 

Geometric Optics 


A perfectly planar wave is 

/(r, t) = ae^ (r,t) (8.1) 

where f might denote either E or B (and we shall not concern ourselves here with polarization), 
and ip{ r, t) = tot — k • r + ip is the phase. 

In geometric optics, which can be defined as the limiting case of wave optics as A —» 0, the 
radiation field locally approximates a plane wave: 

f(r,t) = a(r,t)e i ^ ( 8 . 2 ) 


with slowly varying amplitude a(r ,t) and numerically large phase, or eikonal, ip{r,t). 
Inserting the above in the wave equation 

= v 2 / - — — = 0, 

dxidx 1 c 2 dt 2 

where we have defined the four-vectors x 1 = (ci,x) and Xj = (— ct, x), gives 


d 2 


dxidx 


° + + - 0 . 


Ox, dx i 


dxjdx i dx, Ox 1 ' 


(8.3) 


(8.4) 


But if i/r is very large, the last term here overwhelmingly dominates and we obtain the eikonal 
equation 


or 


(Vif>) 2 - iv’ 2 = 0. 


( 8 . 6 ) 


Referring to the pure plane wave, for which ip = k ■ r — cat, + ip and so Vip = k and ip = —ui, the 
eikonal equation says that 

k 2 - w 2 /c 2 = 0 (8.7) 

which is also analogous to the relativistic energy-momentum relation for a massless particle 

p 2 - E 2 /c 2 = 0. (8.8) 


For the interesting case of w = constant, we can write ip = — uit + ipo{ r ) and the eikonal equation 
is 

(VV’o ) 2 = ^ (8.9) 

The surfaces ipo (r) = constant are wavefronts. The eikonal equation tells us that they march forward 
across space with uniform spacing, and the energy propagates perpendicular to the wavefront. 
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The eikonal equation becomes more useful in situations where the refractive index varies with 
position. 

The eikonal equation can be used as a starting point to derive Fermat’s principle of least time 
according to which light rays propagate along lines of extremal time of flight. Fermat’s principle can 
also be understood in terms of constructive interference. 


8.1 Caustics 


In geometric optics light rays behave much like a stream of ballistic particles, traveling along straight 
lines in vacuum and, more generally, propagating along extremal time curves if the refractive index 
varies from place to place. The density of rays will, in general, vary from place to place, and this 
corresponds to variation in the energy flux. 

Consider a collimated uniform beam which passes through an inhomogeneous ‘phase screen’ (a 
region with spatially varying refractive index — e.g. shower glass). The phase screen results in a 
slight corrugation of the wavefront, and consequently the rays will be slightly deflected from their 
original parallel paths and this will result in spatial variation in the energy flux. 

A generic feature of such light deflection is the developments of caustic surfaces on which the flux 
is infinite. To analyze this, consider a phase screen which only causes a deflection in one direction 
(say along the x-axis). Label the rays by their initial spatial coordinate x. We will also refer to 
this as the Lagrangian coordinate. After suffering a phase perturbation <j)(x) the wavefront will be 
advanced by an amount h = \(j>/2n and the normal to the wavefront will be tilted by an angle 
9(x) = V/i( x) = W<j>(x)/2n. 

After propagating a distance D beyond the phase screen, the rays will have been deflected, or 
mapped , to an Eulerian coordinate 

r(x) = x + D9(x) (8.10) 


this is called a Lagrangian mapping. Let the rays initially have a uniform density no in Lagrangian 
space. The density in Eulerian coordinates is given by the Jacobian of the mapping since ndr = n 0 dx 
and therefore 


n = no(dr/dx) 1 oc (1 + Dd9/dx) 1 . 


( 8 . 11 ) 


This means that for those rays which passed through a region of the phase screen where d9/dx < 0 the 
density of rays, and therefore the energy flux, will become infinite at finite distance D = (dO/dx)~ x . 
This is the phenomenon seen on the bottom of the swimming pool on a sunny day. A simple caustic 


is shown in figure 8.1 


The generic nature of the resulting caustics can be found from a simple graphical argument. 
Figure 8.2 shows r vs x for various depths D for a simple sinusoidal deflection 9 = 0 o cos ( a; )- The 
intermediate curve is plotted for the distance at which infinite flux first appears. At greater depth 
r( x) has pairs of maxima and minima at which one has a ‘fold catastrophe’. Observers lying between 
these points will see triple images of the distant source source of illumination. 

The energy flux is formally infinite at positions r c corresponding to turning points (ie dr/dx = 
r' = 0). In the vicinity of such a point, with Lagrangian coordinate x c 


r = r c 


-r"(x c )(x - x c ) 2 


( 8 . 12 ) 


so a small interval Ar = r — r c neighboring a caustic corresponds to an interval in Lagrangian space 
of 


/ o \ 1/2 

AX= (^)J ^ Ar (8 ' 13) 

and since the rays are uniformly distributed in x, the density of rays near a caustic is 

n (x Ax/ Ar (x 1 /VAt. (8-14) 


This is the universal scaling law for the intensity close to fold caustics — the generic type of 
caustic. The flux density diverges inversely as the square root of distance from the caustic surface, 
F oc 1 /-y/r, but the integrated flux f dr F from the caustic is finite. 
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Figure 8.1: Formation of a caustic. The rays illustrated here (propagating left to right) have been 
subjected to a smoothly varying deflection field. This leads to focusing of rays and formation of 
a caustic. It is apparent that at each point inside the caustic surface there are three different ray 
directions, and an observer would see three images of the assumed distant source. The high spatial 
density of rays just inside the caustic surface correspond to high amplification of the flux. 


A counter-example is well figured lens, which produces a different type of singularity - the flux 
becomes infinite at a point rather than on a surface. However, this is a special and degenerate case. 
For any finite degree of aberration of the lens the perfect conical focus will degenerate into a ‘astroid’ 
type caustic. 

Caustics are always associated with the appearance of multiple images. As one passes through 
the surface, a pair of infinitely bright images will appear at the same point on the sky. They will 
then rapidly move apart and become fainter. Caustic surfaces may be nested within each other, and 
further pairs of images may appear. This leads to the ‘odd number of images’ theorem often invoked 
in gravitational lensing. 

We have assumed here perfect geometric optics. For finite wavelength the caustics may be 
smeared out. We have also worked in 1-dimension for simplicity, but the results are readily extendible 
to a 2-dimensional deflection screen, for which the amplification, for instance, is given by the inverse 
of the Jacobian of the transformation from Lagrangian to Eulerian space: 


A = 



Sij + D 


d 2 h 

d rtdrj 



(8.15) 


This is readily generalized to finite source distance. Writing Z?ls for the distance from the source 
to the deflecting screen (or lens) and D os from the source to the observer (ie the point where we 
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1 —D deflection 



-5 0 5 


x 

Figure 8.2: Eulerian r coordinate versus Lagrangian x for sinusoidal deflection screen for increasing 
distance behind the deflection screen. 


measure the energy flux gives 

a _ c . -Dls-Dol d 2 h 

A — dij H —-„—„— 

Dos oridrj 


(8.16) 


One can look at this from an alternative point of view. Consider a narrow conical bundle of rays 
propagating back from the observer and map these onto the source plane. For a narrow cone, these 
get mapped to an ellipse, with area proportional to the amplification. This is consistent with the 
result that the intensity, or surface brightness, of sources is not affected by static deflections; the 
way in which the amplification arises is by changes in the apparent angular size of sources. 

The matrix appearing here is called the distortion tensor and can be used to describe the change 
in shapes of objects seen through inhomogeneous refractive media. 


8.2 Random Caustics 

In many cases of interest the phase screen is a random function. One example is gravitational ‘micro- 
lensing’, and other random deflecting screens arise in the interstellar medium and in the atmosphere, 
though in the latter cases, the situation is generally complicated by diffraction. Another physical 
realization of random caustics which is of great familiarity is the illumination pattern on the bottom 
of the swimming pool. 
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8.2.1 Probability for Amplification 

An observer monitoring a source through a random refractive medium will see spikes in the flux from 
the source as the caustic surfaces sweep past (usually because of motion of the observer relative to 
the source). A direct corollary of the universal scaling law for the profiles of fold caustics is that there 
is a universal probability law for high amplifications, regardless of whether the deflecting screen is 
highly non-Gaussian (as in micro-lensing) or Gaussian (as in the swimming pool or the atmosphere). 

The probability that a source will appear amplified by at least a factor A will be proportional 
to the fraction of Eulerian space in which the density of rays is n > Auq. Since n oc l/\/Ar, this 
fraction is proportional to Ar oc 1/A 2 , so P(> A) oc 1/A 2 , and therefore the differential probability 
distribution is 

p(A)dA oc dA/A 3 . (8-17) 

8.2.2 Caustics from Gaussian Deflections 

For a Gaussian random phase screen one can compute the power spectrum and auto-correlation 
function of the illumination pattern. 

Consider a set of finely, but uniformly, spaced initially parallel rays in Lagrangian space, with 
x = iAx. We will work here in 1-dimension for simplicity, but the results are readily generalized to 
2-dimensions. 

Mapping these rays to Eulerian space yields a density field 

n{r) = S(r — Xi — D9{xi)) (8.18) 


with transform 


h(k) = J dr n(r)e ikr = ^ jUxt+DOM) _> J dx e lkx e ikD9{x) . 

The power spectrum of the density of rays in Eulerian space is 

P E {k) oc (|h(fc)| 2 ) 


(8.19) 


( 8 . 20 ) 


or 


P E (k) oc fdxf dx 1 ^ e ikD(e(x)-eix'))^ e ik(x-x') m 


( 8 . 21 ) 


For a Gaussian random variate / it is easy to show the that (e*^) = e A ^ 2 . Now the quantity 
kD{9{x) — 9{x')) is a Gaussian random variate, so 


(fd'kD(6(x)—9(x , ))\ _ e ~k 2 D 2 S e (x )/2 


with Sg(x) = ({9(x) — 9{ 0)) 2 ) the structure function for the deflection angle, and so 


P E (k) OC J dx e -k 2 D 2 S„(x)/ 2 e ikX' 


( 8 . 22 ) 


(8.23) 


This gives the power-spectrum of, and therefore the auto-correlation of, the density in Eulerian space 
in terms of the two point function or power spectrum of the deflection in Lagrangian space, since 


/ dk 

—Pg(k)(l - cos kx). 


(8.24) 


An interesting model is a power-law power spectrum Pg(k) — P*(fc/fc*)". The structure function 
is then 

S e {x )/2 = P*k~ n J ^fc"( 1 - cos kx) = 2P*k- n x- (n+1) I n 


(8.25) 
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with 


I n — f ~y n (l~cosy). 


(8.26) 


Convergence of this dimensionless integral at low and high frequencies requires — 3 < n < — 1, and 
for spectral indices in this range, the structure function is a power law in x: 


S e (x)/2 = I n P*k- n x- ( - n+1 \ 


(8.27) 


As a specific example, consider the case n = —2. We then have Sg(x)/2 = J_ 2 P*fc 2 \x\, or 
equivalently k 2 D 2 Sg(x)/2 = k 2 \x\/ko(D) with 


k 0 (D) = 


1 


D 2 P+k 2 I-0 


(8.28) 


so 


P E (k) 


dx 




\/ko e ikx = dx e -(k 2 /k 0 -ik)x + c c = 


2 kn 


k 2 


k 2 ' 


(8.29) 


This functional form is known as a Lorentzian profile. The power spectrum thus has a double 
power law spectrum, P E (k) oc fc°, 1/k 2 at low and high spatial frequencies respectively. The break 
between these two asymptotic power laws occurs at k = ko ~ 1 /(k 2 D 2 P+). The significance of this 
characteristic scale is made more transparent if we note that the rms deflection angle on scale A ~ 1/k 
is A0(k) ~ yfkP{kj = V p * k */ k so Ad ( k o) ~ V p * k l/k 0 = 1 /k 0 D, or equivalently, Ao ~ DA9(X 0 ). 
The characteristic scale Ao is such that the ‘focal length’ for fluctuations on this scale is on the order 
of D. Since A6*(A) oc \/A for this spectral index, this implies that the characteristic scale varies with 
distance is Ao oc D 2 . 

It is interesting to note that the above result actually violates the ‘universal scaling law’ for 
high amplification P(A) oc 1/A 3 defined above. To see that these are in conflict note that the 
second moment of the amplification computed from P(A) is (A 2 ) J dA A 2 P(A) and is logarithmically 
divergent at large A. Computing (A 2 ) = f P E (k ) using the above equation, in contrast, gives 
a perfectly finite result. The resolution of this seeming paradox is that in deriving the universal 
scaling we implicitly assumed that the deflection is smoothly varying, and, in particular that the 
mean square gradient of the deflection (O' 2 ) is finite. For the n = — 2 case considered above, however, 
the mean square deflection gradient is (O' 2 ) ~ f dk k 2 Pg(k) which is ultra-violet divergent (i.e. it 
grows without limit as k —> oo). 

If we modify the deflection spectrum and impose some kind of cut-off at large k such that 
f dk k 2 P(k) becomes finite then we recover the log-divergent mean square amplification predicted 
1-3 " 


by P(A) a A -3 . To see this note that £g(0) = ~ ,/ ff k 2 P(k ) is then finite. Writing the structure 

+ ... we see that 
Finally, inserting 
0 which is indeed 


function as Sg(x) = (£e(0) — £g(x))/2 and expanding £g(r) = t;g( 0) + ^'(0)x 2 


Sg(x) oc x 
„2 


as x —> 0, as compared to 


for the pure n = —2 power law. 
Sg(x) oc x 2 in (8.231 and integrating over k we find J dkPsik) ~ f dx/x as x — 


logarithmically divergent. 

What is happening here physically is that the high frequency deflections in the pure power law 
model are disrupting the caustics that would otherwise form. 

The same type of analysis can be applied in 3-dimensions to the ‘Zel’dovich approximation’ for 
growth of cosmological structure. 








Chapter 9 

Diffraction Theory 


Geometric optics applies in the limit A —> 0, or effectively A the size of the ‘wave-packet’. For 
finite A, the classical wave uncertainty principle says that a packet of size L cannot be perfectly 
collimated, but must have a range of momenta Sk/k i> X/L, and this results in spreading of the 
wave packet. 

Diffraction theory extends geometric optics to allow for the effect of finite extent of stops, aper¬ 
tures, baffles, pupils etc. It is also needed for a proper treatment of scintillation, since diffraction 
effects will tend to smear out the geometric optics caustics discussed above. 

Consider a plane wave with k oc z incident on some kind of aperture or ‘pupil’ in the plane 
z = 0. A proper treatment would involve solution of Maxwell’s equations with appropriate boundary 
conditions at 2 " = —oo and on the surface of the absorbing stop. Diffraction theory uses a simple, 
but physically appealing, approximation. We assume that on a surface covering the aperture the 
wave amplitude (/ = E, B ) is what one would obtain in the absence of the aperture 

f{x, y, z = 0,t) = f 0 e~ lult (9.1) 



Figure 9.1: Parallel vertical lines at the left represent wavefronts in a collimated beam incident on 
an aperture. In diffraction theory the field amplitude at some field point r behind the aperture is 
computed as the sum over ‘Huygens’ wavelets’ (indicated by the dashed lines) of complex phase 
factors with ip the path length R in units of \/2ir. 
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with to = ck, and that the amplitude of the field at some point to the right of the aperture (ie at 
z > 0) is obtained by summing over elements of the wavefront — called ‘Huygen’s wavelets’ — of 
the amplitude times a factor e 1 ^/R where R is the distance from the wavelet to the point in question 
and ip = kR is the phase factor corresponding to path length R for a wave with this wavenumber. 
More precisely, the factor also contains y, the cosine of the angle between the vector R and the 
normal to the surface element, but here we mostly consider situations where y is very close to unity. 
Mathematically, this is 


f(x,y,z,t = 0) 


/ 


P N> 

dA y — 
P R 


J J dx' dy' y 


gikR 

~R~ 


(9.2) 


with R = y/(x — x ') 2 + (y 
tude: 


y') 2 + z 2 . The energy density is obtained by squaring the field ampli- 


u(r) 


/ 


pty 2 

dAy — 


(9.3) 


This is for sharp-edged stops that either transmit the full amplitude or completely block it. The 
theory can be generalized to apertures with soft edges by introducing a continuous aperture function 
A[x',y') in the above equations. The square of this function is the fraction of energy transmitted. 
A phase screen can be described by a complex aperture function. 

For rigorous derivation and discussion of the domain of validity of this approximation see Born 
and Wolfe. 

In the geometric optics limit, the effect of a sharp-edged aperture is to introduce a sharp-edged 
shadow. At small distances behind the aperture, the effect of diffraction is essentially to soften the 
edges of the shadow. At very large distances, the spreading of the beam is large compared to the 
width of the aperture, and all that is relevant is the distribution over direction of the wave energy. 
These two regimes are described as Fresnel diffraction and Fraunhofer diffraction respectively. 


9.1 Fresnel Diffraction 


A classic problem in Fresnel diffraction theory is to compute the shadow of a knife edge as seen on 
a screen placed at some distance A behind it. Let (a;, y) denote the position in the plane of the 
knife-edge, and (xo,yo) the position on the screen (see figure 9.3). By symmetry, the intensity on 
the screen is independent of yo, so let’s calculate the intensity at (xo,yo = 0). 

The amplitude on the screen is 


f(x 0 ) = J dy J dx e iW D2 +0-xo) 2 +v 2 . (9.4) 

oo cc>0 

Consider first some point at large Xq , ie well away from the geometric shadow. The complex phase 
factor is stationary for x = Xq, y = 0 and has constant value e lkD . Expanding the path length as 
we move away from (x,y) = (xo,0) gives 

ip = k\JD 2 + (x — Xq ) 2 + y 2 ~ kD + k[(x — Xq ) 2 + y 2 ]/2D (9.5) 

so the phase change is small compared to unity over a region in the knife-edge plane of size Sx, Sy ~ 
\/D/k but beyond this the phase change is large and increases rapidly with distance resulting in 
strong destructive interference. The real part of the complex phase factor is shown in figure |9.2| 
The upshot of this is that the contribution to the amplitude is dominated by a small region of size 
\/D/k around (x,y) = (xo,0). This size is called the Fresnel length rf ~ \J~D~Jk ~ VD\, ie the 
geometric mean of the path length and the wavelength. 

At a position Xq much greater than the Fresnel length xq y/D/k the effect of the knife edge 
will be negligible, since it cuts off the contribution from regions of the aperture plane which would 
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• 2 

Figure 9.2: The surface shows the real part of the function e“ which appears in Fresnel diffraction 
calculations. 


give a negligible contribution even without the screen. For general Xq we need to allow for the 
limitation on the integral from the knife-edge and we have 


f{x o) oc J dy e iky2/2D j dx e ik{x ~ Xo) 


! /2 D 


(9.6) 


rc>0 


(see figure 9.3 ). The y-integration introduces a constant factor, which we shall ignore, since we are 
interested here in the shape of the illumination pattern on the screen. Introducing a dimensionless 
and shifted x- variable r){x) = (x — Xo)\/k/2D we have for the field amplitude 


f(x 0 ) oc J dp e lv = J dr) (cos r / 2 + i sin if) 

— w{xo) —w(x 0) 

which is clearly a dimensionless function of 

w(x 0) = \fkjTDx o- 


(9.7) 


(9.8) 


It is not difficult to show that for large negative w (ie well inside the geometric shadow) the 
amplitude is / oc l/\w\, so the intensity / ~ |/| 2 is proportional to l/u> 2 . For large positive w the 
integral is nearly constant with decaying wave-like ripples with scale 5w ~ 1 or (5a:o ~ \J DX ~ 77 . 
The exact result for the intensity can readily be expressed in terms of the tabulated ‘Fresnel integrals’. 
For a more detailed and rigorous discussion of this problem see Landau and Lifshitz vol 2. 

Consider now instead of a knife-edge an aperture of width L. We saw for a knife-edge that 
diffraction scatters light a distance ~ \JDX beyond the geometric shadow. If the distance to the 
screen satisfies \JD A L then diffraction results in a relatively small modification to the edge of 
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Figure 9.3: To calculate the field amplitude at the point r 0 = (xo, 0) in the observer plane at the right 
we need to integrate the Fresnel function e® 7 r l r ~ r °l’"/ IJA over the unobscured region in the knife-edge 
plane. The first few zero rings of the real part of the Fresnel function are illustrated schematically. 


shadow pattern much as for the knife-edge. However, this condition fails to apply for sufficiently 
large aperture-screen distance D i> L 2 / A. In the latter regime, the Fresnel scale exceeds the width 
of the aperture, and a different approach is required. 


9.2 Fraunhofer Diffraction 

For a two dimensional aperture A(r), the field amplitude at a point ro = (xo,yo) is again a 2- 
dimensional integral over the aperture plane: 

/( r 0 ) = J d 2 r A( r )e ik V D2 +\ro-r\^ (9.9) 

As before, we can expand the phase factor as 

ij) = k\JD 2 + |r 0 — r| 2 ~ kD + fc|r 0 | 2 /2£) — fcr 0 • r/D + k\r\ 2 /2D (9.10) 

Now if the aperture-screen distance is very large: D kL 2 , then since r < L the last term here is 
always small compared to unity and may be neglected, and replacing k —> 27r/A we have 

d 2 r A(v)e~ 2 ™° r/XD . (9.11) 

The phase factor in front of the integral is irrelevant, and we recognize this as the transform of the 
aperture function /(r 0 ) = A(27rir 0 /AZ?). 

This gives the amplitude as a function of ro on a screen at great distance D , or equivalently the 
amplitude in a direction 6 = Tq/D, or again equivalently for the amplitude for waves scattered from 


/(r 0 ) = e ^+|r 0 |V 2 C) J 
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Ai(r) 


A 2 (r) 


Figure 9.4: Two complementary screens A-\ (r) and ^(r) such that Ai+A 2 = 1, so the light passed 
by one is blocked by the other and vice versa. Babinet’s principle tells us that the distribution of 
scattered light is the same for the two screens. Note that in the case of A 2 — for which the blockage 
is finite in extent — there is a (infinitely) large non-scattered component. 


initial wave-number k into waves with wave-number 

k'= k + q = k+^/c (9.12) 

and squaring the amplitude gives the intensity for scattered radiation in direction dLl = dO x dO y = 
dq x dq y /k 2 

I(0)d0(x\A(q)\ 2 d 2 q. (9.13) 

The key assumption here is that the width of the aperture is much smaller than the Fresnel 
scale \/D A. This greatly simplifies the maths, as only a small part of the Fresnel function then 
contributes, and we can approximate e lnr / DX by a simple plane wave over the limited region that 
contributes to the field amplitude calculation. 


9.2.1 Babinet’s Principle 

There is a useful relation between the beam pattern for complementary screens A\, A 2 where holes 
in one match obscuration in the other, or, more generally, where 


v4i(r) + A 2 (v) = 1 


(9.14) 


(see figure 9.41. Transforming this gives A^q) + A 2 (q) oc (5(q). Now the radiation scattered out of 
the original beam has, by definition, q ^ 0, so dii(q) = — A 2 (q) and therefore 


h{0) = J 2 (fi) 


(9.15) 


so the intensity of radiation scattered out of the original direction is the same for complementary 
screens, which is Babinet’s principle. 


9.3 Telescope Resolution 

An interesting application of diffraction theory is to compute the point spread function for a tele¬ 
scope. A well figured lens or mirror introduces a phase lag which turns plane waves from a distant 
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Figure 9.5: A refracting telescope converts incoming planar wavefronts to circular wavefronts con¬ 
verging on the focal point o. The path length L' from r to x is shorter than L by an amount A L — 6x 
for small 0. Thus the phase factor in the integration for the field amplitude at x is e l2,rx ' r / LA . 


on-axis source into spherical wavefronts. In the geometric optics limit A —> 0 these converge to a 
point on the focal plane. What is the pattern of energy flux on the focal plane as a function of 
distance x from the origin when we allow for finite wavelength? 

If the focal length of the telescope is L then a shift x in the focal plane corresponds to an angle 
6 = x/L so the variation in the path length from x to a point r on an incoming on-axis wavefront 
in the pupil plane is the same as the distance between an on-axis wavefront and a wave-front tilted 
by 0, so the phase factor is Sip = 2n0 ■ r/X and the field amplitude is 

/(x) oc [ d 2 r A(r)e 2mx r/LX . (9.16) 


Note that this is formally identical to the formula for the field amplitude in the Fraunhofer approx¬ 
imation. The focusing action of the lens makes the amplitude pattern on the focal plane identical 
(aside from a scale factor) to that computed in Fraunhofer theory for an observer at infinity for the 
same aperture, but without a lens. 

Squaring the field amplitude, as usual, gives the focal-plane energy density pattern, or point 
spread function, 

g(x) ex |/(x)| 2 oc |A(27rix/LA)| 2 (9.17) 

which says that the PSF g(x) is the square of the transform of the aperture function evaluated at 
k = 27rix/LA. Equivalently, the PSF is the transform of the auto-correlation function of the aperture 
function 

g(x) oc J d 2 z e 2nix z / LX j di 2 r A{r)A*{r + z) (9.18) 

where we have allowed for the fact that the aperture function A(r) may contain a complex phase 
factor if there are aberrations for instance. 


9.3.1 The Optical Transfer Function 


The Fourier transform of the PSF g{ k) is, according to (9.181, equal to the auto-correlation of the 
aperture evaluated at a lag of z = kL\/2Ti. The transform of the PSF is known as the optical 
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transfer function (OTF). Since the image F obs (x) (i.e. the energy flux) formed on the focal plane is 
the convolution of the true image with the PSF: 

-Fobs(x) = (Ptrue ® ff)x (9.19) 

then, by the convolution theorem, the transform of the focal plane image is 

F obs (k) = F truc (k)g(k). (9.20) 

Often, there are other sources of blurring in the system. For example, in CCDs there is further 
image degradation due to charge diffusion, and telescope tracking errors give a further blurring. The 
utility of the OTF is that in such cases the OTF for the complete system is the product of the 
separate OTFs due to the optics, the detector, the tracking errors etc. 

The choice of argument for the OTF can sometimes be a source of confusion. The wave-vector 
k has units of inverse length, whereas sometimes the OTF is taken to be the auto-correlation of 
the aperture, whose arguments has units of length. The dimensionful conversion factor is L\/2ir. 
Also, the PSF is often considered to be a function of the (vector) angle on the sky: PSF= g(6) 
with 6 = x/L, in which case the corresponding wave-vector k is dimensionless. In some cases the 
convention being used is clearly stated, but more often one has to infer what the argument units are 
by inspection. 

One occasionally finds reference to the modulation transfer function (MTF). This is just the 
modulus of the OTF. 

9.3.2 Properties of the Telescope PSF and OTF 

Several generic properties of telescope PSFs and OTFs can be inferred from the above equations: 

• If the aperture has width Da then the PSF has width Ax ~ LX/Da (in distance on the focal 
plane) or A 9 ~ A /Da in angle. 

• Since the aperture is bounded in size, the OTF is bounded in frequency: g( k) = 0 for k > 
Air Da/LX. This means that images formed in real telescopes always have band limited signal 
content. By virtue of the sampling theorem, there is then a critical sampling rate such that 
all of the information is preserved. Few optical telescopes, however, are sampled at the critical 
rate. 

• Obstructions of the pupil such as secondary mirror support struts lead to extended linear 
features known as ‘diffraction spikes’ in the PSF. Similarly, sharp edges in the pupil lead 


accompanying caption for more discussion. 

• High frequency figure errors also lead to extended wings on the PSF. 

• Such wings are a serious impediment for e.g. extra-solar planet searches as the target planet 
can be swamped by scattered light from its parent star. 

• The scattered light in the PSF wings can be greatly reduced by a podizing the pupil, though 
this is not often done. Careful choice of pupil can also help. For example, as shown in figures 


the former case the scattered light is confined to narrow radial ‘spikes’, and so faint companion 
objects can still be detected in other parts of the image. 


9.6 


and 9.7 square and circular pupils both produce azimuthally average g(x ) oc l/x 3 , but in 


to extended ‘wings’ of the PSF with g{x) oc 1/x 3 for x Ax. See figures 9.6 and 9.7 


9.3.3 Random Phase Errors 

The telescope image is also affected by random phase errors. There arise either from fine-scale mirror 
roughness — imperfections arising in the grinding and/or polishing process — and atmospheric 
turbulence, when mixing of air with varying entropy results in fluctuations in the refractive indec. 
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Figure 9.6: A square pupil can be considered to be the product of two one dimensional box-car func¬ 
tions: A(x,y) = W(x)W(y) and consequently it’s transform is the product of two ‘sine’ functions: 
A(k x ,ky) = W{k x )W{k y ) with W(k) = s\n(kD a/2) / (kD a/2.) . This (real) function is plotted above, 
and its square is the PSF. The straight sharp edges of the pupil give rise to extended ‘diffraction 
spikes’. Now the amplitude of the ripples in the sine function fall off asymptotically as 1 fk for large 
argument, so the amplitude of the diffraction spikes fall as g(x ) oc l/x 2 . The width of these features 
is independent of x, however, so if we azimuthally average the PSF, the energy scattered to large 
radius falls as l/x 3 . As we shall see, the same is true for a circular aperture (see figure 9.71. 


Phase errors, random or otherwise, can be treated by introducing a phase factor C'(r) = 
which multiplies the pupil function. The field amplitude is then 


/(x) oc (AC). 


(9.21) 


Mirror Roughness 

If the phase errors due to mirror roughness are small; ^(r) -C 1, as is commonly the case, one can 
approximate C(r) as C(r) ~ 1 + ?</>( r). The field amplitude is then the sum of two terms; the first 
being simply that computed from the pupil function alone, while the second is /(x) oc ( A(j >). 

Mirror roughness can be an important contributor to the wings of the PSF. The field amplitude 
is the convolution of A with <fi evaluated, as usual, at k ~ 2nx/LX. Well outside of the diffraction 
limited core — i.e. at x XL/Da — the PSF is domimated by the mirror roughness term and 

g(x) ~ |(/>(k = 27 tx/LA)| 2 (9.22) 

i.e. the PSF is proportional to the power spectrum of the phase fluctuations: g(x) oc P,p(k(x)). 

Empirically, the telescope PSF is often found to have an extended aureole with g oc l/x 2 . This 
contribution will dominate over edge effects (g oc l/x 3 ) and atmospheric effects (g oc l/x 11 / 3 ), 
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Figure 9.7: The PSF for a circular pupil can be computed exactly in terms of Bessel functions, from 
which one can show that, asymptotically, the wings of the PSF scale like 1/r 3 . To see how this 
comes about, and how this result obtains quite generally for any pupil with a sharp edge, consider 
computing the transform of the pupil at some wave-number k = 27r/A with direction as indicated. 
The (real part of the) transform is just the product of the pupil function with a plane wave cos(x • k), 
and similarly for the imaginary part. If k R then there will tend to be very good cancellation 
between the positive and negative half-cycles of the wave. The strongest contribution comes from the 
last half-cycle, which does not get cancelled. If the pupil is circular then the length of this sliver is 
~ V-RA while its width is ~ A, so the net contribution to the transform is A(k) ~ R 1 / 2 A 3 / 2 oc k ~ 3 / 2 . 
Squaring this gives the asymptotic behaviour for the PSF: g(x) = \A(k = x/2ttLX)\ 2 oc 1/a: 3 . 


if present, at very large radii. To generate this requires a power-law phase-error power spectrum 
P<p(k) oc 1/fc 2 . This is two-dimensional Bicker noise ; with equal contribution to the phase variance 
form each logarithmic range of frequencies. 

In general, the the total scattered light is proportional to the phase error variance (for flicker 
noise this is logarithmically divergent, as is the integral f d 2 xg(x)). This has a non-negligible effect 
on aperture photometry, especially when comparing results from different delecopes with differing 
mirror properties. 

If the fine-scale mirror surface errors can be approximated as a statistically homogeneous process 
then the PSF - being the power spectrum of 4>(r) ‘windowed’ with A(r), and it follows that the 
aureole is speckly with microscale equal to the diffraction limited PSF width. 


Atmospheric Turbulence 

Atmospheric turbulence is more difficult to analyse, as the phase fluctuations are non numerically 
small. For pure Kolmogorov turbulence, the phase structure function is S^(r) = ((</>(r) — </>(0)) 2 ) = 
6 . 88 (r/ro ) 5 ’ 3 where ro is the Fried length. On scales larger than ro the phase errors are large 
compared to unity. A realization of a wave-front after passing through a turbulent atmosphere is 
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Figure 9.8: A realisation of a wavefront after passing through a turbulent atmosphere. Kolmogorov 
turbulence produces refractive index fluctuations Sn with a power law power-spectrum P n (k) oc 
A - n/3, and the same is true of the 2-dimensional phase (this being a projection of the refractive 
index). The phase fluctuations are strongly ‘infra-red divergent’, whereas the mean deflection, 
averaged over a patch of a given size, diverges at small scales. 


shown in figure |9.8| 

For big telescopes it is usually the case that the Fried length is much less than the pupil diameter 
D. So again A(r)C(r) takes the form of an infinite random field fluctuating on scale r c modulated, 
or ‘windowed’, by the aperture function. The transform of AC is therefore, on general grounds, an 
incoherent function C with overall extent ~ 1/ro convolved, or smoothed, with A and the squared 
intensity |/(x)| 2 will consist of a set of random speckles , each of the size of the PSF for a diffraction 
limited telescope, but spread over a larger area of the focal plane. A realisation of such a PSF is 
shown in figure |9.9| 

Now the phase fluctuation screen varies with time (primarily due to being convected across the 
pupil by wind) and this means that the speckles tend to dance around, with a relatively short time- 
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Figure 9.9: A realisation of a PSF arising from atmospheric seeing with Kolmogorov turbulence. In a 
diffraction limited telescope there are two important length scales; the wavelength A and the aperture 
diameter D. The angular resolution, which is of course dimensionless, is A 9 ~ A /D. Refractive index 
fluctuations in the atmosphere introduce an additional length scale: the Fried length rg. This such 
that the root mean squared difference in phase-error for two points with separation ?'o is on the order 
of one radian. In the atmosphere dominated PSF, each speckle is in the order of the diffraction limit, 
but the overall angular width of the PSF is A 9 ~ A/rg. 


scale and if one integrates longer than this, as is usually the case, then the result is a smeared out 
PSF with angular resolution 9 ~ A/Vo- This is the resolving power of a diffraction limited telescope 
of aperture diameter D ~ ro- 

For long integrations, it is possible to subtract an azimuthally averaged PSF to help reveal faint 
companions for example. However, if N , the number of photons per speckle per speckle coherence 
time is large, then the fluctuations about the mean profile will be larger than the photon counting 
limit by a factor vA. This excess variance is called speckle noise. 

For sufficiently bright sources it is possible to observe the instantaneous speckle pattern, and 



118 


CHAPTER 9. DIFFRACTION THEORY 


from this deduce properties of the source on scales comparable to the diffraction limited resolution. 
This is called speckle interferometry. 


9.4 Image Wander 


The wave-front deformation h(r) generated by Kolmogorov turbulence (figure 9.8 1 has lots of low- 
frequency power. The region of the wave-front covering the telescope pupil, of size D , will have a 
average slope Vh ~ \/((h(0) — h(D)) 2 )/D , corresponding to an angular deflection of the same size. 
The phase fluctuation is 5(p = h/X, so the typical net shift in the angular position is expected to be 
59 ~ S (p (D) 1 ^ 2 X/D. With 5 v (r) = 6.88(r/r 0 ) 5 / 3 this is 59 ~ (A/ro)(r 0 /H) 1 / 6 The overall angular 
width of the PSF on the other hand is A 9 ~ (A/ro). Thus, for an extremely large telescope, the net 
defelction of the image position caused by the atmosphere becomes negligible. The ratio of the net 
shift to the PSF width falls off quite slowly, only as the —1/6 power of the diameter. For modest 
size telescopes, a substantial part of the atmospheric image degradation comes from the so-called 
image wander. This is important, since such image wander, as well as image motion induced by the 
wobbling of the wind-buffeted telescope, can be taken out be tip-tilt correction. Traditionally this 
has been achieved by introducing a wobbling mirror, or glass plate, with a servo loop to freeze the 
motion of a continuously monitored guide star. More recently it has become possible to take out 
the image motion by shuffling the accumulating electronic charge around the CCD. 

The above estimate of the deflection — that it is just some kind of average of the wavefront slope 
across the pupil — is rather hand waving. We can make this more precise if we consider the centroid 
of the PSF. The starting point is the expression for the PSF as the Fourier transform of the OTF 


(9.18l. We will write this as 


ff( x ) 


* /LX x(z) 


(9.23) 


with 


x(z) = / d 2 r zl(r)A(r + z)e 


(r+z)) 


(9.24) 


i.e. the auto-correlation function of the real pupil function A times the atmospheric phase factor e llp . 
The first moment of the PSF is 


J d 2 x xg(x) = J d 2 z x(z) J d 2 x xe 2 " x z /iA (9.25) 

= ^ i J d2x J d2z x(z)V 3 e 2mx z/iA (9.26) 

= f d 2 x f d 2 ze 2 ™*' z/LX V zX (9.27) 

=J d 2 z <5(27rz/LA)Vx (9.28) 

= ~i (^) (Vx)z=o (9.29) 


where we have first used the fact the V,e 27rixz / I ' A = (2ttx/L\) e 27Tl:xz ^ LX . We then integrated by 
parts to shift the gradient operator from the complex exponential to X - We then recognized the 
^-integral on the third line as a Dirac 5-function. 

The zeroth moment of the PSF is similarly found to be 


J d 2 x g(x) = J d 2 z x(z) J d 2 x e 27rix ' z /- LA = g x(z = 0). 

The centroid of the PSF is defined as 

= / d 2 x xg(x) = iLX /Vx\ 
f d 2 x g(x) 2n \ X J z=0 ’ 


(9.30) 


(9.31) 
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Now the gradient of (9.241 is 


V-x = v z J d 2 r A(r)e Mr) A(r + z)e~ iv>(r+z) 

= [ d 2 r H(r)e iv(r) [VA(r + z) - iA{ r + z)V^(r + z)] e - iv5(r+z) 


(9.32) 

(9.33) 


If we now evaluate this at z = 0 we have 

(Vx) z =o = J d 2 r A(r)VA(r) — i J d 2 r A 2 (r)'Vp. 


(9.34) 


The first term is the integral of a total derivative A(r)V4(r) = 
and, with (9.31) we finally obtain 


LX f d 2 r W / (r)V^(r) 
27t J d 2 r lU(r) 


VH 2 /2, which therefore vanishes, 


(9.35) 


where W(r) = A 2 (r) is the square of the pupil function (and which is equal to H(r) in the usual 
case that the pupil transmission is either unity or zero). 

Thus the PSF centroid position is indeed a weighted average of the slope of the wavefront defor¬ 
mation or phase-error across the pupil. What is remarkable about this result is that while the actual 
PSF is a highly non-linear function of the phase-error — since the phase appears in complex exponen¬ 
tial factors — the centroid is a linear function of ip. This means that if the phase-error has Gaussian 
statistics, as the central limit theorem would encourage us to believe is the case, then the centroid is 
also a Gaussian random variable. Given the power-law form for the power spectrum for the phase- 
error, one can, for instance, compute the temporal power-spectrum or auto-correlation function of 
the deflection x(f) (problem-TBD). Given a model for the distribution of seeing with altitude one 
can also compute the co-variance of the deflection for different stars. This is all very convenient and 
elegant. However, we should point out that the centroid statistic, while mathematically appealing, is 
quite useless in practical applications as it has terrible noise properties because the integral must be 
taken over the entire noisy image. What is usually done in reality is to smooth the image with some 
kernel with shape similar to the average PSF and then locate the peak. Unfortunately, computing 
the statistical properties of this smoothed-peak motions is much more complicated, and these mo¬ 
tions do not have precisely Gaussian statistics. The behaviour for such realistic position estimators 
can be found from simple numerical experiments, as one can easily generate a large realization of a 
phase-screen like that in figure [Q~8] and then drag this across a model pupil and compute the PSF as 
a function of time numerically. Such experiments, and indeed analytic reasoning, suggest that the 
statistical properties of the smoothed-peak motions are quite similar to those of the centroid, but 
that the high temporal frequency behavior is rather different. 


9.5 Occultation Experiments 

The Fresnel scale has a useful significance in occultation experiments. Let us assume that one 
monitors a large number of distant stars to look for occultations by objects in the solar system. This 
approach is limited by diffraction effects. In the discussion of the knife-edge, we saw that the waves 
which interfere constructively cover a ‘Fresnel zone’ in the occulting plane of size ~ VD A, and so 
strong occultation is expected only for objects which are larger than 

L = L 2 ki "(Au) y ■ < 9 - 36 > 


9.6 Scintillation 

Consider radiation at wavelength A from a point source propagating a distance D through a random 
refractive medium. Let the medium consist of clouds or domains of size r c with random fractional 
refractive index fluctuations 5n/n. 
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The light crossing time for a domain is t c = r c /c, and the fluctuations 8n introduce fractional 
changes in the light crossing time St/t ~ 5n/n leading to corrugation of the wavefronts of amplitude 
Sh ~ cSt ~ r c 5n/n, corresponding to angular tilt (induced by a single cloud) of 89\ ~ 8n/n. The 
total deflection will be the sum of TV ~ D/r c random deflections or 


60 ~ 



(9.37) 


(see figure 9.101. 

If the refractive index fluctuations are sufficiently strong and/or if the path length is sufficiently 
long, this will lead to multi-path propagation. In the geometric optics regime, the condition for 
multi-path propagation is 

D69 ;> r c . (9.38) 


For geometric optics to be valid we require that the Fresnel scale be smaller than the separation of 
the rays 

rf = V^DA D89 (9.39) 

since only if this condition is satisfied do we really have well defined separate paths. If this condition 
is satisfied then in the vicinity of the observer there will be a superposition of waves with slightly 
different angles, resulting in a periodic interference pattern. If the observer is moving relative to 
this pattern, as is generally the case, this will result in oscillations in the apparent brightness of the 
source. The spatial scale of these oscillations is As ~ A /SO. 

In the case of the ISM, this mechanism — known as diffractive interstellar scintillation or DISS 
— can yield important information about the nature of the intervening medium even if the angular 
splitting is too small to be resolved. Imagine, for instance, one observes a clean sinusoidal oscillation 
in brightness. From this one can infer that there are two paths interfering. From the time-scale of 
the oscillations, together with some estimate of the observer’s velocity relative to the fringe-pattern, 
one can infer As and from this the angular splitting 89 ~ A/As, and from this, together with some 
estimate of the distance of the source, one can infer r c ~ D66, the characteristic size of the clouds. 
This in turn tells us TV ~ D/r c , the number of clouds along a line of sight, and hence one can infer 
the strength of the refractive index fluctuations 8n/n ~ 89/y/N. 

Note that this mechanism requires that the source be small: 80s ^ As/D. Conversely, this 
mechanism can provide information about the sizes of sources which cannot be resolved. 

If the geometric optics condition is violated then the diffraction pattern will be washed out. 
There may still be fluctuations in the source brightness, but these are caused by the net amplification 
averaged over a cigar shaped volume of width ~ rf. This tends to be weaker, and occurs on a much 
longer time-scale. This is called refractive interstellar scintillation or RISS. 

The atmosphere also causes scintillation; the ‘twinkling’ of stars. Atmospheric turbulence causes 
refractive index fluctuations with spectral index n = —11/3, and the two-dimensional structure 
function for phase fluctuations is therefore ((Si/) 2 ) ~ (r/ro) 5 ^ 3 . In good observing conditions, and 
in the optical, the phase-correlation length (or Fried length) is on the order of ro ~ 40cm. The 
amplitude of the wavefront corrugations on scale r are therefore Sh ~ A(r/ro) 5 / 6 and consequently 
the angular deflection is 88 ~ Sh/r ~ Ar 0 5 / 6 r ~ 1 / 6 . If the turbulence is at altitude D (with typical 
value D ~ 5km) then, in the geometric optics limit, the amplification A is on the order of 


A(r) ~ — - TAAr“ 5/ V 7/6 


(9.40) 


with A ~ 1 indicating the onset of caustics and multi-path propagation. However, for A ~ 5 x 10 _7 m 
and D ~ 5km, the Fresnel length is rf ~ 5cm which is larger than the scale for which geometric 
optics gives A = 1 since 


r(A = 1) 



(9.41) 


r f 


Ho. 


which is considerably smaller than unity for the parameters here. This very rough order of magnitude 
conclusion is supported by more accurate calculations (e.g. Roddier and co-workers) which show 
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Figure 9.10: Upper panel illustrates the retardation of wavefronts 5h as they pass through an over- 
dense cloud with refractive index perturbation Sn. An example ray is also shown. The middle panel 
illustrated the deflection of rays by a collection of randomly placed clouds. In this case the positions 
have contrived to defelct the rays so that the observer at right sees two images of the source. The 
lower panel shows a blow up of the region around the observer. Wavefronts of the two interfering 
signals are shown, along with the observer velocity vector and the transverse distance over which 
the signal from the source is modulated. 


that under these ‘good seeing’ conditions the amplification should be 10%, and these theoretical 
results are supported by measurements of the scintillation index. In bad seeing, the Fried length is 
considerably smaller, and the scintillation can then be substantial. 


9.7 Transition to Geometric Optics 


Consider a simple single-element reflecting telescope whose otherwise perfect mirror suffers from a 


smooth aberration or figure error h{ r) as indicated in figure 9.11 
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Figure 9.11: In the geometric optics limit, rays reflected off a mirror with a slight aberration suffer a 
small anomalous deflection 59 given by the gradient of the wavefront deformation h( r) (so the height 
of the mirror deviation is h( r)/2) and arrive at the focal plane at x(r) = LVft(r). Caustic surfaces 
intersect the focal plane at the turning points of the deflection, where \d 2 h/dridrj | vanish. 


According to geometric optics, waves from a distant on-axis source reflecting off the mirror at r 
will suffer an anomalous deflection 

50(r) = V/i(r) (9.42) 

and reach the focal plane at a displaced location 

x(r) = L50(r) = LX7h( r) (9.43) 


with L the focal length, and the energy flux of these rays is 

-l 

(9.44) 

This results in the usual caustic surfaces at points in the focal plane corresponding to turning points 
of the deflection. 

For long wavelengths, on the other hand, the width of the PSF is L\/D LV/i and the 
aberration is negligible. 

Let us analyze this transition from wave-optics to the geometric-optics limit from the point of 
view of diffraction theory. The field amplitude at point x on the focal plane is 


F(x) oc 


dxj 

dr-; 


— _L 


OC 


d z h 


dridr-i 


/(x) 


J d 2 rA( 


d 2 r A( r)e^ (r;x) 


(9.45) 


with A(r) the aperture or pupil function and ip{ r; x) = 2n(h(r) — x • r/L)/A. 

As the amplitude of the phase fluctuations due to the aberration become large compared to unity 
we expect that the greatest contribution to the field amplitude will come from points in the pupil 
plane where the phase if) is nearly stationary. That is, at extrema of ip(r). These are points r 0 where 
the gradient of the aberration function h{ r) happens to coincide with the gradient of the planar 
function x-r/Ior where 

(V/i) ro = x/L. 

These are just the reflection points of geometric optics. 


(9.46) 
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In the vicinity of such a point r 0 we have, for the phase error, 


5i/j 



X jj 52 M r o) 
2 ^' dr, dr. 


(9.47) 


with r' = r — ro- Changing integration variable in the amplitude integral to r' and dropping the 
prime gives 

/(x) 

with htj = d 2 h/dr.idrj. Inspecting the exponential factor in the integral we expect the dominant 
contribution to come from a region of size Sr \JX/h". This ‘Fresnel zone’ becomes smaller with 
decreasing A, so for most points ro(x) we can neglect the limitation of the aperture function A(r). 
Transforming to a rotated frame in which h is diagonal, 




(9.48) 


hxx 0 

0 kyy 


(9.49) 


and the field amplitude becomes 

/(x) = e i 27 r/l ( r o)/ A J dx e i7Th ** x2/x j dy e inh ™ y21A . 


(9.50) 


Now, on dimensional grounds the a;-integral has value \/\/'Kh xx times some factor of order unity, 
and similarly for the y-integral, so on squaring the wave amplitude we obtain the energy flux 


|/(x)| 2 oc l/(h xx h. 


xx ,v yy 


(9.51) 


but this is just the inverse of the determinant of the matrix hij , which is a rotational invariant, so 
in general we have 


l/(x )| 2 


d 2 h 


dr. ; dr. 


-l 


(9.52) 


again in agreement with the geometric optics result (9.44). 


The argument above is somewhat misleading in that we have simply squared the amplitude 
arising from a single Fresnel zone, whereas in general, we expect there to be a superposition of the 
amplitudes from several zones, each of which will have a different phase factor e l 27 rft ( r o)A, an d this 
results in an illumination pattern on the focal plane which with a periodic pattern, with alternating 


zones of constructive and destructive interference, as illustrated in figure 9.12 


9.8 Problems 

9.8.1 Diffraction. 

You wish to construct a pinhole camera from a box of side L = 10cm to take a photograph at a 
wavelength A = 4 x 10 _ 5 cm. Estimate the diameter of the pinhole required to obtain the sharpest 
image. 


9.8.2 Fraunhofer 

A perfectly absorbing disk of radius R lies in a collimated beam of monochromatic radiation of 
wavelength A and absorbs energy at a rate of 1 erg/s. What is the net energy flux in the radiation 
scattered out of the original beam direction by the disk according to Fraunhofer diffraction theory. 
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FIGURE ERROR 


AMPLIFICATION 




COSINE(PHASE) GEOMETRIC OPTICS WAVE OPTICS 



Figure 9.12: Geometric vs wave optics. Upper left panel shows an example of a quasi-random 
figure error h( r). This is a sample of smoothed Gaussian random noise windowed by the circular 
aperture function. Upper right panel shows the amplification A = \d 2 h/dridrj I” 1 according to 
geometric optics. Lower left panel shows the cosine of the phase cos(^>(r)) with ^(r) = 27r/i(r)/A. 
The complex field amplitude at a point x in the focal plane is equal to the Fourier transform 
of cos(ip) + ism(ip). Lower right panel shows the PSF computed according to diffraction theory 
by squaring the field amplitude while the lower center panel shows the PSF for this figure error 
in the geometric optics limit. The sharp, well-defined features in the wave-optics PSF arise from 
regions in the pupil plane where cos (ip) has a quasi-monochromatic variation over relatively extended 
regions, and are associated with the ‘critical-curves’ seen in the amplification pattern. These features 
correspond to the geometrical optical caustics. The periodic oscillations in the wave-optics PSF arise 
from interference between radiation arising from distinct zones on the pupil plane. 

9.8.3 Telescope PSF from Fresnel Integral 

Consider a large ground based telescope of aperture A and focal length D observing at wavelength 
A. Assume that the optics have been configured to give diffraction limited performance. This means 
that planar wavefronts from a distant source on axis interfere constructively at the origin x = 0 of 
the focal plane with negligible phase shifts (or equivalently that the optical path length from x = 0 
to any point r on the entrance pupil plane is constant to a precision better than A). Similarly, 
wavefronts from a source an angle 6 off axis will interfere constructively at x = DO. 

a. Show from the foregoing considerations that the optical path length from x to f is given by 

l = constant + x ■ r/D (9.53) 

b. At any instant, the wave front from a distant point source will have been distorted by fluctua- 
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tions in the density of the turbulent atmosphere through which it has passed. Let the vertical 
displacement of the wavefront in the entrance pupil be h(r). Use Fresnel diffraction theory to 
show that the amplitude for the radiation held on the focal plane is 


a(x) oc 


J d 2 r c{r)e 2 ™' p/DX 


(9.54) 


where C(r) = e lv ^ where the phase shift is ^(r) = 2nh(r)/X. Thus show that for a sufficiently 
long integration time the intensity pattern on the focal plane (the ‘point spread function’ or 
psf) is 

g(x) = (|a(f)| 2 ) = j d 2 r fc(f)e(9.55) 

where £c(r) = {C(r' + i :f )C*(r > )). Show that the fourier transform of the psf (the ‘optical 
transfer function’) is therefore given by 


g(k) = J d 2 x g(x)e ltx = £ c (kD\/2 tt) 


(9.56) 


c. Assuming the distortion of the wavefront h{r) takes the form of a 2-dimensional gaussian 
random held, one can show that 8,cif) = (e*^ 1- ^ 2 ^) = e- s vlf)/ 2 where S v (r) = ((^(U) — 
g>(r' + r )) 2 ) is the ‘structure function’. For fully developed Kolmogorov turbulence one has 
S<p(r) is a power law S v {r ) = (r/ro) 5 / 3 where ro is the ‘Fried length’ over which the rms phase 
change is unity. Compute the OTF in this case, and estimate the (angular) width of the psf 
in terms of ro, A. 

d. Assuming that the amplitude of the wavefront deformation h(r) is independent of wavelength 
show that the resolution scales as the 1/5 power of wavelength. 

e. Derive the relation (e^ Vl-V2 ' ) ) = e _s r( r )/ 2 for a gaussian random held ip(r). 
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Chapter 10 

Radiation from Moving Charges 


10.1 Electromagnetic Potentials 

Maxwell’s equations ( |7.4| ) involve the six variables E(r, t) and B(r,t). They can be reformulated in 
terms of 4 potentials 0(r,t), A(r,f). 

The second of Maxwell’s equations V • B = 0 tells us that B is the curl of some vector A: 

B = VxA. (10.1) 

Using this in Faraday’s law VxE = — ^dti/dt (M3) tells us that 

v *( E uf )= 0 (la2) 


so the quantity in parentheses is the gradient of some scalar function — (j), or 

1<9A „ 

E = - V0 

c at 

These potentials are not unique, since if we make the gauge transformation 

A-» A' = A + Vip 

(j) -> tj/ = (j) - lip 

where ipj r , t) is an arbitrary function this leaves the physical fields E, B unchanged. 
Using (10.31, Gauss’ law V 2 (() = 47rp becomes 

1 . ld(V-A+j/c) 

- at - ‘~ i ” p 

and Ampere’s law (M4) similarly becomes 

1 B 2 A 

V ' A - c2 + V ( V • A + </V c ) = - 47r j/c 


(10.3) 


(10.4) 


(10.5) 


( 10 . 6 ) 


The terms involving the quantity £ = V • A + <j>/c can be removed by a suitable gauge transfor¬ 
mation, since, under the transformation (10.41 


C^C' = C+no¬ 

where we have defined the d’Alembertian or wave operator 

1 d 2 


□ = V 2 - 


: 2 dt 2 


(10.7) 
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( 10 . 8 ) 
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Figure 10.1: Illustration of the source term Q(t) for the retarded potential. We carve space up into 
cells (left panel) and then solve for the field due to a single cell (shown highlighted). For a single 
moving charge, for example, the source term Q(t) would be a short pulse (right). Having solved for 
the field from once such microscopic cell we then invoke the linearity of the field equations to add 
together the solutions for all of the cells. 


Since tp(v, t) is arbitrary, it can be chosen to make £ = Oip vanish: 


V-A+-^ = 0 
c at 

which is called the Lorentz gauge condition. 

With this choice of gauge, Maxwell’s equations take the very simple form: 

□A = —47rj/c 
□</ = —47rp 


(10.9) 


( 10 . 10 ) 


In the absence of charges both A and <f> obey the wave equation, and the terms on the RHS are 
source terms. 

See appendix [E] for a review of invariance of electromagnetism. 


10.2 Retarded Potentials 


The equations (10.101 are linear in the fields and in the charges. This means that the solutions for a 
superposition of source terms can be obtained by summing the solutions for the various components. 
In particular, we can describe a general charge/current distribution by carving space up into a grid 
of infinitesimal cubical cells and giving the charge within, and current passing through, each cell 
(see figure 10.11. 

Consider one infinitesimal cell, which we can place at the origin of our coordinates. Let the 
charge within the cell be Q = Q and the current be J = ^ qv. These will be, in general, functions 
of time, so Maxwell’s equations are 


□ A = —47rJ(£)<5(r)/c 
U(j) = —4nQ(t)8(r) 


( 10 . 11 ) 


The four source terms here are clearly spherically symmetric, so <f>(r,t) = 4>{r,t ) and A x (r,t) = 
A x (r, t ) etc., and the solutions will share this symmetry. To find the solutions, we use the Laplacian 
for a spherically symmetric function 

1 d(r 2 df/dr) 
r 2 


V 2 /W 


dr 


( 10 . 12 ) 
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(Problem: show this.) Applying this to the potential, and making the change of variable <j>{r,t) = 
x(r,t)/r we find that for r ^ 0 the function x( r >^) satisfies the 1-dimensional wave equation 

X "-x/c 2 = 0 (10.13) 

where x" — d 2 x/dr 2 . This has the general solution 

X(r,t) = Xi (t - r/c) + X 2 (t + r/c) (10.14) 

where xi> X '2 are two arbitrary functions which must be chosen to satisfy appropriate boundary con¬ 
ditions. The first term here describes an outward propagating wave, which is physically reasonable, 
so we discard the latter. 

Finally, we need to find xi(t) which gives the correct Coulombic behavior in the immediate 
vicinity of the source: <j>(r,t) —» Q(t)/r as r —> 0. This requires Xi(t) = Q{/)- Similarly, requiring 
that the magnetic potential tend to the magneto-static form gives the A (r,t) — > 3(t)/cr and the 
solutions to Maxwell’s equations are 


A(r, t) = J(t — r/c)/cr 
0(r,t) = Q(t - r/c)/r 


(10.15) 


Problem: show that A(r) = J/r reproduces the Biot-Savart law with B = V x A. 

These are the solutions for a single infinitesimal source cell located at the origin. Summing over 
all cells give the potentials at some arbitrary ‘field point’ r as 3-dimensional integrals 


A(r, t) = i/ d 3 r' j(r', t — |r -r'|/c)/|r - r'| 
<f>(r,t) = f d 3 r' p(r>,t-\r-r'\/c)/\r-r'\ 


These are called the retarded potentials. They explicitly give the potentials (f> , A produced by an 
arbitrary charge distribution, and, with (10.1), (10.31 the fields E, B. 


The potentials (10.161 are just like the electro- and magneto-static potentials, but where the 


source is evaluated at the retarded time. 

To check that these solutions do in fact obey the Lorentz gauge conditions, rewrite these with a 
change of integration variable to r" = r — r' so 


V-A = 


3 // j(r — r", t — r") _ 1 


(Pr 


'3 " 


tfV 


(^ ' j)r— t", t—r" 


(10.17) 


and similarly </> = f d 3 r" p( r — r", t — r") jr" and so 


V ■ A + cj)/c = - 


^3 // ' J + P)r~r" ,t-r n 


(10.18) 


which vanishes due to charge conservation. 

One can add to these any solution of the homogeneous wave equations, to describe, for instance, 
radiation from external sources. 


10.3 Lienard-Wiechart Potentials 


The Lienard-Wiechart potentials are the specialization of (10.161 to the case of a single particle of 


charge q moving along a path r 0 (t) with velocity u = r 0 . To obtain these, we first rewrite (10.16) as 


4-dimensional space-time integrals by introducing a 5-function. For the scalar potential this gives 


<j>(r,t) = J d 3 r J dt 

For a point charge, the charge density is 


3 / [ P( r '’ ~ t + \r - r'|/c) 


r - r' 


(10.19) 


= qS(r' - r 0 (t')) 


( 10 . 20 ) 
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which allows us to perform the spatial integration to give 


<t>(? ,t) =q J 


dt‘ 


,S(t' - t+ |r- r 0 (t , )|/c) 
|r-r 0 (t')| 


or, with R(t') = r r 0 (t') and R = |R|, we can write this as 


0(M) = Q J 


dt‘ 


,S(f — t+ R(t')/c) 

R(f) 


( 10 . 21 ) 


( 10 . 22 ) 


This equation says that the potential seen by an observer at point r and time t only ‘knows’ about 
the state of the particle at the specific instant t' on its trajectory. That instant is when c(t — t') 
happens to coincide with the distance to the particle R(t'). This is the point in space-time at which 
the particle’s trajectory intersects the past light cone of the observer (see figure 10.2). It should be 
obvious from this figure that, provided the particle’s velocity does not exceeed the speed of light, 
there can only be one such point. This time, the solution of the equation t' + R(t')/c = t, is called 
the retarded time t' = i ret (t). 

The usual procedure is now to formally manipulate the (5-function using the property of the Dirac 
(5-function that S(F(t') — t) = 8(t' — F -1 ^)) / F' (t 1 ) where F(t) is an arbitrary continuous function, 
F -1 is the inverse function, and F' denotes the derivative of F. (To prove this simply perform a 
Taylor expansion of F{t') around t' = t re t(t) = F -1 (f).) This gives an integral over dt' involving 
<5(t' — fret(i)) which is then easily performed. Here we will take a more pedestrian, though entirely 
equivalent, approach. Let’s calculate the potential at the observer’s location, averaged over a small 
(eventually infinitesimal) time interval 0 < t < r: 

r r 

{4>) = * J dt (/)(r, t) = | J J dt 8{t' - t + R{t')/c). (10.23) 


The second integral has value unity if 0 < t! + R(t') < r, and zero otherwise (since in the latter case 
the 5-function falls outside of the domain of integration). The time averaged potential is therefore 


t(r) 


w = - 

T 


At' 


t(0) 


dt' 


R{t') R(tret) T 


(10.24) 


with At' = f r et (d~ ) ~ tret (0) ? which is the coordinate time it takes the particle to pass from the 
past light cone of the point (r, 0) to that of (r,r). This approximate equality becomes exact as the 
averaging interval r —> 0. 

We need therefore to calculate the ratio At' /t, since the actual potential is just the regular 


coulombic potential 4> = q/R times this factor. Inspection of figure 10.2 shows that for a stationary 


charge At' /t = 1; for a charge moving away from the observer at v = c, At' /t = 1/2 while for an 
observer moving towards the observer, At' /t —> oo as v —> c. To obtain the general result, simply 
insert t re t( T ) = t le t (0) + At' into the t re t( T ) + -R(4et( r ))/c = r (which is the equation defining what 
we mean by the retarded time) to give 


^■ret (0) + At' + R(t 

ret (0) + At')/c — t. 


(10.25) 


Doing a Taylor expansion: R(t let (0)+At') —> i?(t re t(0))+At'di?/dt' and using t re t(0)+i?(t re t(0))/c = 
0 yields 

At=(i+-—) t. (10.26) 

Now since by definition R{t') = \J (r — ro(i')) • (r — ro(t')), the derivative here is dR(t')/dt' = 
— R(t') • ro(t') = — R(t') ■ v(t'), and we have 


At' _ 1 

T 1 — R(t') • v(t')/c 


(10.27) 
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Figure 10.2: Space-time diagram used to derive the Lienard-Wiechart potentials. The bold vertical 
line represents the observer — taken here to be stationary — and the heavy curve represents the 
moving charged particle. The lines at 45 degrees are the past light cones of the points on the 
observer’s world line (r. t = 0) and (r ,t = r). The charged particle intercepts the past light cone 
of the point (r ,t) only once, at a retarded time t' such that t' + R(t') = t and at distance R(t'). 
We show in the text that the potential seen by the observer is just equal to the Coulomb potential 
4> = q/R(t') times the factor A t' /t, with A t! the (coordinate) time taken to pass between the two 
past light cones. This factor ranges from 1/2, for a particle moving very rapidly (v ~ c) away from 
the observer, to arbitrarily large for a relativistic particle moving directly towards the observer. The 
general formula for this ratio is indicated. 
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Figure 10.3: The geometry for calculation of dipole radiation from a small region of space (circle) 
containing charges such that the net current J is parallel to the z-axis. The ‘field point’ at large 
distance r is indicated. In the calculation it is assumed that the charge distribution is effectively 
static over the time it takes for radiation to cross the system. 


A key feature of the L-W potentials (10.281, (10.291 is that the potentials, and hence also the 
fields, become very large when 1 —R-v/c is very small. This happens when a highly energetic particle 
is moving almost directly towards the observer, as one would expect from relativistic beaming. The 
potentials also provide a useful starting point to discuss such issues as the transition from coulomb 
potential to radiation at large distances. We will use them in the treament of synchrotron and 
Cernekov radiation. 


10.4 Dipole Radiation 


Imagine we have some oscillating charge distribution within a small region of space, with the net 
current aligned with the z-axis as illustrated in figure |10.3 The potential and field solutions will 
approximate those for a single microscopic cell (10.151. If the current is 


J(f) = JqzC 1 


(10.30) 


so the magnetic potential is 

A(r, t) = J 0 ze iul{t - r / c) /cr (10.31) 

The magnetic field is given by taking the curl: B = V x A. For a static field (co —> 0), the spatial 
derivatives act only on the 1/r factor, giving a field falling as B oc 1/r 2 (the Biot-Savart law). With a 
time varying current, the spatial derivatives also act on the complex exponential, so 3A ~ {iuj/c) Jo/r 
giving only a B oc 1/r fall-off. At sufficiently large distances the 1/r component comes to dominate, 
and this is the radiation field, with energy density oc E 2 = B 2 oc 1/r 2 . 

More precisely, we have d x e~ lur l c = {—iu)x/cr)e~ lu>r ^ c and hence 


B = V x A = 


J 0 e iut 

cr 


x y z 

r) r) f) 

u x u y u z 

0 0 e ~ iur / c 




xy 

r 2 



(10.32) 


• The magnetic field is perpendicular to the current (B • z = 0), and also to r, the direction of 
propagation (as required for transverse waves). 

• The amplitude of the magnetic field is greatest for points with z = 0 — ie around the equator 

and is zero for points along the z-axis. The energy density is oc |B| 2 and is proportional to 
sin 2 9 - with 9 the polar angle. 
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• The electric field is trickier to calculate this way, but we know that it must be perpendicular 
to B and to the line of sight r, and equal in magnitude. It is oriented parallel to the projection 
of the current vector on the sky. 

• The pre-factor iui tells us that the field is proportional to the rate of change of the current. 
Since the current is proportional to the charge velocity, this means that the radiation field is 
proportional to the charge acceleration. 

• The current is given by J = ^ gv = <9(^ qr)/dt so the field is proportional to the second time 
derivative of the dipole moment d = ^ qr. 

For an arbitrary small system of charges — where the current need not align with the 2 -axis — 
the spectral decomposition of the radiation field is then 

( 10 - 33 ) 


10.5 Larmor’s Formula 

If the current is due to a single charge, we have for the Poynting vector 

f 2 n 2 ii 2 

J • 2 r\ V a • 2 r 

0 = -—r sin 0 


s = -^(e 2 + b 2 ) = -^b 2 = 

8n 4-7T 47rr 2 c d 


sin 


47rr 2 c 3 


(10.34) 


with u the charge velocity (we have derived this for a single harmonic component, but it is true in 
general by Parseval’s theorem). 

The energy radiated into direction dO, per unit time is Sr 2 dO hence 


dW 

dtdfl 


q 2 u 2 . 2j 

4t re 3 Sm 


(10.35) 


and integrating over direction gives the total power 


dW 


q 2 u 2 


P = = iirc7 2w / ^ f 1 — l J,2) 


(10.36) 


-l 


the integral is elementary and leads to Larmor’s formula for the power radiated by an accelerated 
charge: 

(10.37) 


10.6 General Multi-pole Expansion 

We showed above that a small region containing a collection of moving charges generates a radiation 
field proportional to the second time derivative of the dipole moment of the charge. In obtaining 
this we implicitly assumed that the variation of the system over the time it takes for light to cross 
it is negligible, so the result is valid only for sufficiently small regions and velocities. This dipole 
approximation is just one term in a more general expansion for the radiation from a collection of 
charges. 

To derive the general multi-pole expansion we start with (10.16) and rewrite the expression for 
the magnetic potential as a 4-dimensional space time integral by introducing a temporal (5-function: 


1 


3 „,/ 


A(r, t) = - j d 6 r 


dt 


, j(r',t')6(t'-t+\r-r'\/c) 


Next, define the temporal Fourier transforms of the current and potential to be 

E(r) = J dt j(r,f)e l “* 


(10.38) 


(10.39) 
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Figure 10.4: For a bounded charge distribution of size L and a distant observer at r > L one can 
approximate the distance \v' — r| ~ r — n r'. 


and 


to obtain 


A w (r) = J dtA(r,t)e l: 


A„(r) =\fdtf d?r' J |r-r'|/c) e<h , t 

1 r ,a / ,sp^I'-''I 


(10.40) 


(10.41) 


= U<^r , j u (r , )^ =Fr 

The transform of the potential at some frequency u> is thus the convolution of j w (r) with the circular 
wave function e lkr /r. This is reminiscent of diffraction theory. 

Now let’s place the origin of coordinates within the charge distribution of extent r' ~ L , and 
place the field point at some distance r> 1 as illustrated in figure |10.4 For r>Lwe can replace 
|r — r r | in the denominator with r and expand the distance factor in the complex exponential as 
|r — r'| ~ r — n r', with n the unit vector in the direction of the field point, to obtain 


n ikr 


A w (r) = - 


’ j«(r')« 


(10.42) 


and expanding the exponential in the integral we have 

ikr 


A w (r) = -Y] / d 3 r ■ r'f 

cr ',/ 


(10.43) 


n—0 1 


This is a series expansion in fcn • v' ~ L/X. This parameter will be small (and so the series will 
rapidly converge) provided L <C A, or equivalently if the light crossing-time for the system is small 
compared to the period of the radiation, which will usually be the case if the charges are moving 
non-relativistically. 

The lowest order term in the expansion is n = 0 for which 

p ikr r p ikr r p ikr 

A<W(r) = - / rfV j w (r') = - dt e iut d(t) = - itud„ (10.44) 

cr J cr J cr 

with d(f) = ?r {t) the dipole moment of the charge distribution. 

The next term in the series is 

— ■ihp ikr r ^ib P ikr . . 

A«(r) = - / d 3 r j^(r , )n • r 1 = -Y'gr , (n-r') (10.45) 

cr J cr 


which involves the quadrupole moment of the charge distribution and this radiation is called quadrupole 
radiation. 
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Figure 10.5: A simple antenna consists of two equal charges on a spring. The dipole moment for this 
system vanishes. If the wavelength is very long compared to the size of the antenna then there is very 
strong destructive interference of the radiation from the two charges. However, this cancellation is 
not perfect, and once we allow for the finite light-crossing time of the system, then distant observers 
will, in general, see some radiation. The strength of the field is typically smaller than the dipole 
field that one of these charges would create by a factor ~ L/X. The radiation is strongest towards 
the poles, since the phase difference is then maximized, and zero for observers around the equator. 
The energy has a cos 2 (9) or ‘quadrupolar’ dependence. 


For non-relativistic charge distributions, the lowest order non-vanishing term will dominate. In 
many cases this is the dipole term. In some cases the dipole may vanish. For example, consider a 
system consisting of a spring with equal charges at each end and undergoing linear oscillation, as 
illustrated in figure [T0~5 The net current in this system vanishes, as does the dipole moment. In 
the dipole approximation this system does not radiate. This is because to lowest order, the fields 
generated by the opposing currents cancel. However, this neglects the finite time it takes for the 
radiation to cross the system. In general, the field at large distance will be the sum of that from the 
two charges, but they will not be perfectly out of phase, and there will be some net radiation in the 
quadrupole term. Another example is a collision between two equal charges, for which there is also 
no net dipole moment, and the dominant radiation term is again the quadrupole component. 


The above formulae show that a given temporal frequency of the radiation field is generated by 
the corresponding frequency in the relevant moment of the charge distribution. Note that these 
frequencies need not correspond to the frequency of motion of the charges. Consider, for example, a 
system consisting of two charges rotating about their center of mass. In this case the period of the 
quadrupole moment is half the rotation period. 


10.7 Thomson Scattering 

Larmor’s formula gives the power radiated by an accelerated charge. Consider a point charge of 
mass m and charge q lying in a beam of linearly polarized radiation of frequency with electric 
field 


E = E 0 e sin coot 


(10.46) 
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Figure 10.6: Geometry for the calculation of Thomson scattering. A linearly polarized beam of 
radiation enters from the left and illuminates an electron, generating an oscillating dipole moment 
as indicated. This dipole generates an outgoing wave of radiation with cylindrical symmetry. 


as illustrated in figure |10.6| The force on the particle is F = qE and results in an acceleration 
r = qE/m and therefore a 2nd derivative of the dipole moment 


d = qr = 


q 2 E 0 e 


sin u)qI. 


so, according to Larmor, the power radiated (averaged over a cycle) is 


dP 


d| 2 ) • 2 a q ^ E 0 ■ 2 
5o= 4i? sm s = 1 


and the total power radiated (or scattered) is 

P = 


„4 th2 

g A 0 
3 m 2 c 3 


(10.47) 


(10.48) 


(10.49) 


The incident flux is (S) = c(E 2 )/4tt = cEq/8tt , and if we define a differential cross-section da/dO 


for scattering such that 

dP , da 

do [ 'dn 

(10.50) 

then we have 

do- g 4 .2 a 

d£l m 2 c 4 

(10.51) 

For an electron, q = e, this is 

d(J 2-2/i 

— = S, n - « 

(10.52) 

where we have defined 




ro = e = 2.82 x 10 13 cm 

(10.53) 
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as the classical radius of the electron (ie the radius of a cloud of charge e such that its electrostatic 
potential energy is equal to the electron rest-mass energy). 

The total cross section is 


+1 

a “ / dn %l = 27rr ° / t 1 ~ ^ = y r o 

-1 

For an electron this gives the Thomson cross-section 

<7t = 0.665 x 10~ 24 cm 2 . 


(10.54) 


(10.55) 


The limitations on this result are 

• The energy per photon should be much less than the rest-mass energy of the electron hv <C 
m e c 2 or hv <C 0.5MeV. At higher energies we need to take the recoil of the electron into 
account. 

• The field should not be so strong as to accelerate the charge to relativistic velocity. 

The foregoing was for an incident linearly polarized beam. Scattering of natural (unpolarized) 
radiation can be computed by considering a superposition of two incoherent linearly polarized beams 
with perpendicular polarization states. 

Features of electron scattering: 

• The cross-section is independent of frequency. 

• The distribution of scattered energy is forward-backward symmetric. 

• The total cross-section for scattering of natural radiation is the same as for a polarized beam. 

• The scattered radiation is strongly linearly polarized (100% for the radiation scattered at 
right-angles to the incoming beam). 


10.8 Radiation Reaction 


Consider a simple harmonic oscillator consisting of a spring with a charged mass on the end. The 
energy radiated will tap the internal energy of this system and will cause the amplitude of the 
oscillations to decay. 

This may be described, at least phenomenologically, as a radiation reaction force. 

To order of magnitude, the time-scale for the oscillation to decay is fdecay ~ arv 2 /P-n armor ~ 
(mc 3 /e 2 )f 2 rbit ~ cforbit/ r o with t orhit = v/v. Thus the decay time will greatly exceed the orbital 
time or oscillator period provided t 0 rbit ^ t where r is the time for light to cross the classical radius 
of the electron: r = 2e 2 /3me 3 ~ 10 _23 s. If t m bit r, as is usually the case, radiation reaction can 
be treated as a small perturbation. 

The radiation reaction force F must satisfy 


J dt F • u = 


dt Pi 


Larmor 


2-2 


dt 


2 e z u 
3 c 3 


(10.56) 


where the integrations are over one or more periods of the oscillation. An acceptable choice is 


F = 


2e 2 ii 
3 c 3 


mrii. 


(10.57) 


thought this should be treated with caution (see R+L). 
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10.9 Radiation from Harmonically Bound Particles 


Including the radiation reaction force, the equation of motion for an electron tethered by a spring 
with spring constant k, and corresponding frequency cog = \/k/m e , is 


crx 
dt 2 


+ u> qX — r 


d 3 x 
dt 3 


= 0 


(10.58) 


Evaluating the radiation reaction term using the undamped solution x (x coswo t gives d 3 x/dt 3 = 
~u 2 dx/dt and the E.O.M becomes 


X + LOgTX + IXgX = 0 


(10.59) 


in which the radiation reaction term now takes the same form as would a frictional force. 

Searching for solutions of the form x cx exp(cd) (with a complex to allow for decaying oscillations) 
converts this to an algebraic equation 


a 2 + loqTcx + uiq = 0 (10.60) 

with solutions 

a = ±iui o — -o/jr (10.61) 

With initial conditions x(0) = x o and ±(0) = 0 the solution is 

x(t) = xge~ Tt ^ 2 cosixgt (10.62) 

which is a damped oscillation with decay rate 

r =^=£? < io - 63 > 

in agreement with the order of magnitude argument above. 

The transform of this decaying oscillator is 


x(oj) 


= dt x(t)e 1 ' 


1 

r/ 2 - i(u> — u) o) 


1 

T/2 — i(u) + UJg) 


(10.64) 


which is small, save in the vicinity of u> = Tcjq. The radiated power is dW/dio 
precisely 

dW 8iru) 4 e 2 Xg 1 

duj 3c 3 (47t) 2 (uj — wo) 2 + (r/2) 2 


w 4 \x(uj)\ 2 or, more 
(10.65) 


The radiated power thus has a Lorentzian profile. 

Radiation damping therefore imposes a minimum width on spectral lines for electronic oscillators: 
A u) = r = 2e 2 ujg / 3toc 3 or equivalently a width in wavelength of AA = (X/cog)8u> = 2ttct ~ 
1.2 x 10 -4 A. 


10.10 Scattering by Bound Charges 

In the case of scattering by free charges — Thomson scattering — the incident electric field produces 
an acceleration proportional to the field, and hence d oc E and therefore a scattered radiation field 
with amplitude proportional to the incoming field, resulting in a scattering cross-section which is 
independent of frequency. 

In the case of scattering by a charge bound in a quadratic potential well with free oscillation 
frequency ojg, we expect the cross-section to be equal to ot for u> u>g. For low-frequency incident 
radiation to <C cog the situation is different; here the incident field causes the charge to move to 
a displaced position such that the electric field is balanced by the spring constant force, but this 
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results in a displacement, and therefore a dipole moment, which is proportional to the field (rather 
than the second time derivative of the dipole being proportional to the field). 

For a free charge, the dipole amplitude is d oc E/w 2 whereas for the bound charge and w <C w 0 , 
the dipole amplitude is d ~ E/wq, which is smaller by a factor (w/w 0 ) 2 <C 1, and the scattered 
radiation intensity — being proportional to the square of the dipole — is smaller than for free 
charges by a factor (to/cuo) 4 . Thus for low frequency radiation we expect 

cr(w) ~ (co/coq ) 4 <7t (10.66) 

A more detailed analysis shows that there is a resonance at ui = o> 0 where the cross section 
becomes very large. In the absence of damping, this resonance is infinitely sharp, and the cross- 
section becomes formally infinite. Including radiation, or other, damping results in finite width to 
the resonance 5iv ~ T. 

10.11 Problems 

10.11.1 Antenna beam pattern 

Two ocillating dipole moments (antennae) di, d 2 are oriented vertically and are a horizontal distance 
L apart. They oscillate in phase with the same frequency u>. Consider radiation emitted in a direction 
at angle 6 with respect to the vertical and in the vertical plane containing the two dipoles. 

a. Show that at large distances D L the angular distribution of the radiated power is 

dP = w 4 ‘sm 9 2 + 2dl d 2 cos 8 + d\) (10.67) 

as l 47re 3 

where the ‘phase angle’ is 6 = uLsvnd/c. 

b. Show that when L <C A the radiation is the same as from a single oscillating dipole of amplitude 
di + d 2 

c. Generalise to a 2-dimensional array n(x,y ) = JT 5(x — Xi,y — yi ) containing a large number 
of antennae laid out in the plane 2 = 0, now with their dipoles lying in the horizontal plane 
(parallel to the £-axis) say. Show that the ‘synthesised beam pattern’ — which we define to 
be the power radiated as a function of direction 0 — is just the square of the modulus of the 
fourier transform of the array pattern n: 

2 2 

P(0 x ,9y) oc J J dxdyn{x,y)e 2 ™( xe * +ve yV x =^e 27r “^/ A (10.68) 

i 

(you may adopt the small angle approximation). 

d. The VLA is a Y-shaped array of radio receivers (which could also be used as a highly directional 
transmitter). Sketch the resulting beam pattern. 

10.11.2 Multipole radiation 1 

At large distances Rq from a relatively compact distribution of radiating currents the vector potental 
can be written as 

A = ~cR 0 J d3r ^'+ r n / c (10.69) 

where t' = t — Rq/c and n is a unit vector in the direction from the radiating system to the observer, 
a. Expand in powers of r • n/c to obtain 

A =i / 1 ^ + & / <iMr • n)j *' (1 ° 70) 
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b. Show that, for a system of point charges, this becomes 


1 d ^ 

A = =--1—— — ) qv(r • n) 

cRq c 2 i?o &t 


c. Show that 


and thus obtain 


2v(r • n) = —r( r • n) + (r x v) x n 


. d 1 d~ v-^ / \ 1 , . , 

A = + ae ' ,r(n ' r) + (m x n) 


(10.71) 


(10.72) 


(10.73) 


where the dipole moment is d = qr and the magnetic moment is m = Q r x v/c. Thus, in 
this approximation, the potential contains terms proportional to the first time derivative of the 
dipole (dipole radiation); the second time derivative of the quadrupole moment (quadrupole 
radiation) and the first time derivative of the magnetic moment (magnetic dipole radiation). 


d. Show that the radiation field intensity from a bar magnet spinning about an axis perpendicular 
to the line joining its poles is on the order of E, B ~ fh/c 2 Ro and that the radiated power is 
P ~ m 2 /c 3 . 


10.11.3 Multipole radiation 2 

Starting with the expression for the retarded vector potential in the form 

A(r,t) = J d 3 r J dt'y —— t + |r — r'|/c) (10.74) 

a. Show that the temporal fourier transforms A w (r) = J dtA(r,t) exp(iwf) and j w (r) = J dtj(r, t) exp(iu>t) 
are related by 

A^(r) = 1 / rfV e ik\r-r’\ ( 1Q J5 ) 

cj |r — r'l 

where k = w/c. Note that this gives a one-to-one relationship between the temporal fourier 
components of A and j. Note also, that the field A^(r) is just a convolution of the source j(r) 
with the convolution kernel e lk ^ /r. 

b. For sources of size L and field points at distances r>Lwe can take |r — r'| ~ r — nr' in the 
exponential (where r' is measured relative to a spatial origin inside the source) and one can 
take |r — r'| ~ r in the demoninator. Thus show that in this approximation 

Aaj(r) = -— V — f d 3 ? ,, j CJ (r')(—ifcn • r') m (10.76) 

cr ^ o m\J 

For sources with dimension L <C A, this is an expansion in the small dimensionless quantity 
kn ■ r'. The n = 0 component gives the dipole radiation, n = 1 gives the quadrupole term etc. 

c. Now specialise to the case of a point charge q moving in a circle of radius r 0 in the z — 0 plane 
with frequency u>q: 

j(r, t ) = qv6( r - R(t)) (10.77) 

where R = {r 0 cos u>ot, ro sin 0}, v = {—Wo^o sino> 0 t, wo r o cos 0}- Show that the dipole 
radiation is non-zero only at frequency u> = wq, the quadrupole radiation appears at u> = 2 luo 
and so on. 
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10.11.4 Electron scattering 

Larmor’s formula gives the power radiated by an accelerated charge as P ~ q 2 a 2 /c 3 where q is the 
charge and a is the acceleration. 

a. To order of magnitude, estimate the cross section for scattering of radiation by a free electron 
in terms of the electronic charge, mass and the speed of light. 

b. Now consider an electron bound in a parabolic potential well such that the free oscillation 
frequency is ojq. Estimate the cross section for scattering of light as a function of frequency u> 
for the two limiting regimes u> u>o and u> <C coo- 

10.11.5 Thomson drag 

Consider a radiation field which, in a particular frame of reference, appears isotropic with specific 
intensity /„. 

a. Show that a photon with frequency v in the isotropic frame will appear to an observer moving 
at velocity v = /3c <C c with respect to this frame to have frequency v' = (1 — Pii)v where /.i is 
the cosine of the angle between the photon momentum and the direction of motion. 

b. Invoke Lorentz invariance of I v /v° to show that the specific intensity seen by the moving 
observer is 

/'=/„- - vdljdv) (10.78) 

and therefore that 

V = 7 - A(3fil (10.79) 

c. Show that an electron moving through this radiation will feel a drag force F = (4/3 )/3Uctt 
where U = AttI /c is the energy density and ctt is the Thomson cross-section. 

d. Specialise to a black-body radiation field and show that the velocity of an electron will decay 
exponentially with v oc exp(— t/r e ) on a timescale r e ~ m e c/{aT 4 ro). 

e. Compute the decay time r p for the velocity of a blob of fully ionized plasma moving relative 
to an isotropic black body radiation field. 

f. Compute these timescales for electrons or plasma moving through the microwave background 
(T = 2.7K) and compare to the age of the Universe t ~ 77/ -1 ~ 10 10 years. Conclude that 
these effects are of little consequence at the present. However, in the expanding universe, the 
drag rates scale as r _1 ocT 4 oc (1 + z) 4 while the Hubble rate scales as 77 oc (1 + z) 3 / 2 (matter 
dominated). Estimate the redshifts at which 77r e = 1 and 77r p = 1. At redshifts exceeding the 
latter, plasma will be effectively locked to the frame of isotropy of the microwave background. 
At redshifts exceeding the former, the microwave background will act as an efficient coolant 
for hot ionized gas. 

10.11.6 Polarisation 

Sketch the pattern of linear polarisation expected for a non polarised point source embedded in a 
optically thin cloud of ionized gas (assume the scattering is dominated by electron scattering). 
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Chapter 11 


Cerenkov Radiation 


Most radiation generation involves accelerated charges, since a uniformly moving charge does not 
radiate. An exception to this rule is a charge which is moving faster than the speed of light. This 
super-luminal motion may arise in two ways; ordinarily when a high energy particle passes through 
a medium with refractive index n > 1, such as a cosmic ray entering the atmosphere. Alternatively, 
one may have a charge concentration which moves super-luminally (think of caterpillar legs). This 
is not as crazy as it sounds; a group in Oxford are building a device which produces a charge pattern 
with super-luminal motion. 


shows how a supersonic, or super-luminal, particle will outrun any disturbance it makes, and that 
one would expect a conical pulse of radiation, with the normal to the surface of the cone making an 
angle cos ~ 1 {c/v) with the direction of motion of the particle. 

We will analyze the resulting Cerenkov or Heaviside radiation first using the retarded potential, 
and then using the LW potentials. 


Cerenkov radiation can be likened to the sonic boom from a supersonic airplane. Figure 11.1 



Figure 11.1: A supersonic plane excites a ‘sonic-boom’; a wave pulse propagating out with conical 
wave fronts with angle, relative to the direction of motion, 9 = cos ~ x (c/v'). 
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11.1 Retarded Potential 

For a field point very distant from the source of the radiation, the spectral decomposition of the 
magnetic potential can be written as 

A u (r) = J dt' j d 3 r' ](r',t')e luJt 'e~ lkn r ' (11.1) 


where n = r/r. 

If we now introduce a current density which describes a charge (or a charge concentration ) moving 
at a uniform velocity v along the 2 -axis: 


the potential is 


j(r, t) = zqvS(x)S{y)S{z - vt) 

(11.2) 

„ikr r 

AJr) = -—qvz / dt 
cr J 

(11.3) 


Now the integral here is a representation of the 5-function: 2tt5{u>{\ — n z v/c)). Since n z < 1, the 
argument of the delta function is non-zero for all u> if v < c, and there is no radiation. However, if 
v > c the argument of the 5-function vanishes for the direction n z = c/v 1 and the potential is very 
strong (formally infinite). This is just the direction of the outgoing conical wave. 

In addition to being highly directional, the sonic-boom analogy suggests that the radiation would 
have a rather blue spectrum. This is indeed the case. To determine the spectrum, consider a 
relativistic particle propagating superluminally for finite time T, with v > c (as, for example as a 
energetic particle passes through a slab with refractive index n > 1. See figure 11.21. The time 
integral is now bounded, and becomes a ‘sine’ function, rather than a 5-function, and the magnetic 
potential is then 

qvzT e ikr 


A w (r) = -- 


-sinc(w(l — n z v/c)T/ 2) 


(11.4) 


with sinc(a:) = sin(x)/x. The field is obtained in the usual way from the potential as B = V x A. 
Again as usual, the most rapidly spatially varying factor here is e lkr . When we apply the gradient 
operator we can effectively ignore the variation in r provided r 1/k. The sine function also 
depends on r since n z = f z , but this contribution to the spatial gradient is also negligible for r>T. 
If these conditions are satisfied, the field is then 


_ , . iujqvz x r T e 

Bu r =-2- 


The Poynting flux is S = c(E 2 + B 2 )/8 tt 
emitted into solid angle dll is 


ikr 

—sinc(w(l — n z v/c)T / 2). 
r 

cB 2 /A-k and is directed along r. 


(11.5) 


The total energy 


dW = r 2 dO 


dt S(t) 


cr 2 dH 
47r 


dt B 2 {t) 



( 11 . 6 ) 


or equivalently dS u = cB 2 du>/(2n) 2 . The system is cylindrically symmetric, so the solid angle is 
dfl = 2 tt sin 6d6 and and the amount of energy emerging per unit frequency per unit angle is 


d 2 W = 2nr 2 dS^dO sin 9 = 


q 2 v 2 ui 2 sin 3 OdOdu) 
47TC 4 


sinc 2 (w(l 


n z v/c)T/ 2) 


(11.7) 


where we have used |z x r I = sin 8. 


Equation (11.7) tells us how the emergent energy is distributed over frequency and over angle. 
The sine 2 factor limits the contribution to a small range in angle Ad ~ l/wTtan^o around 9q = 
cos -1 c/v. Integrating over direction to obtain the total spectrum, we can ignore the variation of the 
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sin 3 9 term. Changing integration variable from 9 to £ = u>( 1 — n z v/c)T/2 ~ u)t&n6o(6 — 9q)T/2 so 
d9 = 2dC s /uiT tan 9q gives 


dW = 


q 2 v{ 1 — c 2 /v 2 )Tuiduj 


J dC, sin 2 C/C 2 - 


( 11 . 8 ) 


The dimensionless integral here introduces a factor of order unity. Equation (11.71 shows that the 


energy radiated is proportional to the time of flight, as one might have expected, and therefore 
to the path length through the refractive medium. It also shows that the spectrum is blue, with 
dW/doj oc u>. 


11.2 LW Potentials 

The treatment above provides a useful illustration of the use of retarded potentials. It is also 
illustrative to obtain the form of the radiation pulse using the Lienard-Wiechart potentials. 

Recall that the LW potentials are just equal to the electro-static and magneto-static potentials 
times a factor 1/(1 — R ■ v). This means that the potential becomes very large for a particle moving 
directly towards the observer (which can also be understood in terms of relativistic beaming). An 
entirely analogous situation arises here where the particle world line can ‘graze’ the observer’s light 
cone, as illustrated in figure [TT73| This is very different from the situation for a particle moving slower 
than c, for which the particle world line pierces the past light cone of any point on the observers 
world line, and does so exactly once. There is therefore a specific instant on the observer’s world 
line when the particle grazes it’s past light cone. At that instant R • v = 1 and the effective charge 
of the particle becomes infinite. At later observer times, the particle pierces the light cone twice, 
and observer perceives a potential which is the sum of that from two particles of finite charge. 

As another way of looking at this, we saw that the velocity dependent boost factor 1/(1 — R ■ v) 
was equivalent to dt/dr where dt is the coordinate time interval spent by the particle between two 
observer past light cones whose apices differ by time dr. Clearly, as the particle grazes the critical 
light cone, the observer time r is stationary with respect to the coordinate time, and dt/dr —> oo. 

Let the charged particle move along the trajectory 


r (t) = (0,0, vt) 


(11.9) 
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Figure 11.3: Intersection of the world-line of a super-luminal charged particle (or charge disturbance) 
and the past light cone of an observer. For sufficiently early times (t < <*) the particle world-line 
does not intersect the light cone at all, and the observer sees no field whatsoever. For t = t+ the 
particle world-line just grazes the light cone, as illustrated by the opaque cone. For later times 
(e.g. semi-transparent cone) the world line intersects the past light cone at two points. 


and let us place the observer at location r c b s = (xo,0,0). The intersection of the past light cone of 
the event (r, xq, 0, 0) and the plane x = 0, y = 0 is 

z 2 = c 2 (t — t) 2 — Xq. (11.10) 

The intersection of the particle’s trajectory and the past light cone occurs at coordinate times t such 
that z = vt= \/c 2 (t — t) 2 — Xq or 

t _ -r ± \/t 2 v 2 /c 2 - (v 2 /c 2 - 1 )xl/c 2 ) (1111) 

v 2 /c 2 — 1 

This gives two real roots only for r > r* where 

cr* = x 0 \A - c 2 /v 2 . (11-12) 

This is the instant of grazing. Note that we need to use r* > 0. The alternate case corresponds to 
the particle grazing the future light cone. 

Now with v = (0, 0,u) and R = ( Xq , 0, — ut) one can readily show that at the instant of grazing, 
t = t±/(v 2 /c 2 — 1), the factor 1 R v vanishes, as claimed above. Alternatively, and more simply, 
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Figure 11.4: The form of a pulse of Cerenkov radiation. 


taking the derivative of (11.111 with respect to r we find 

1 


dt 

dr 


-1 ± - T 
c sjr 2 


v 2 /c 2 — 1 

The observer feels the potential from two sources A = A + + A 


a ±(t) = 


qv 


"(t±) 


dt 

dr 


(11.13) 


(11.14) 


Specialising to the behaviour of the pulse close to the leading edge (i.e. r ~ r*), we have \J 't 2 — t 2 = 
i/(r + r+)(r — r*) ~ y/2y/r^y /(r — r*), and the potential is 


A (r) ~ 


qv v/c 
cr v 2 / c 2 — 1 


2r* 


r - 


(11.15) 


with r = x o/ sin 0. 

The form of the resulting potential wave pulse is therefore as illustrated schematically in figure 
The potential diverges as 1 /a/A t as r approaches r*. The characteristic form is identical to 


11.4 


(11.16) 


that of a ‘fold-caustic’. 

The magnetic field is B = V x A. Since v = vz, the field at (a:o,0,0) is B = ydA/dxo. Now 
Xq appears in r* = ccosinfl and also in r. Close to the pulse edge, however, the most rapid variation 
is in the factor l/y/r — Xq sin 9/c and we can therefore replace dA/d Xq by — c _1 sin ddA/dr. The 
magnetic field is then 

R 2?l 2 qsJ\ — c 2 /v 2 

V x o c 3 (t - T +) 3 

The field therefore diverges as 1/Ar 3 ^ 2 as r — > t*. This implies a formally infinite (integrated) 
Poynting flux: f dr B 2 oc 1 /At 2 . This divergence is the real-space analog of the divergent energy 
computed from the power spectrum over dW oc udio oc w^ax- This divergence would be removed, 
for example, by introducing a finite width for the shower of particles in an atmospheric Cerenkov 
shower, for example. 

One can also compute the power spectrum of the radiation, and the dependence on v/c, from 
the potential (11.151. Taking the temporal transform we find 


A w = f dt A {t)e l ‘ 


qv 




(11.17) 


where we have dropped some dimensionless factors of order unity. Since cB(t) = —sin 6dA/dt, 
B u = —isin0A w /(cw) and the Poynting flux is then 


dSn, ~ B 2 ,du) 


q 2 sin 2 Or+Lodui 
c 2 r 2 (l — c 2 /v 2 ) 2 ' 


(11.18) 
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If we erect a cylinder of length L and radius Xq around the particle trajectory, the component of the 
energy flux in the direction normal to the surface is sin 9dS. The area area of the cylinder is 2ttxqL, 
so the energy crossing the surface (per unit frequency) is dW = 2ttxqL sin 9 dS u . With .t 0 = rsin0, 
cr* = Xq sin 9 and with L = vT = cvT / cos 9 we obtain 


dW 


q 2 v{ 1 — c 2 /v 2 )Tuiduj 


(11.19) 


in agreement with ( 11 . 81 . 


Astronomical applications include cosmic-ray detection via atmospheric Cerenkov radiation, and 
also in solid state Cerenkov detectors. The other scenario for generating radiation from a super¬ 
luminal charge or current disturbance may have implications for radiation from pulsars. 




Chapter 12 

Bremsstrahlung 


Bremsstrahlung ,, or ‘braking-radiation’, also known as free-free emission, is produced by collisions 
between particles in hot ionized plasmas. 

• Bremsstrahlung arises predominantly from collisions between electrons and ions. Electron- 
electron collisions are ineffective as they produce no dipole radiation. Collisions between ions 
with different charge-to-mass ratio are capable of generating dipole radiation, but their low 
accelerations render them also unimportant. 

• In an electron-ion collision we can take the ion to be unaccelerated. 

• Precise results require quantum treatment, but useful approximate results can be obtained 
from a classical calculation of the dipole radiation, with plausible cut-offs. 

In what follows we will first compute the radiation power spectrum from a single collision with 
given electron velocity and impact parameter. We then integrate over impact parameter to get the 
emission from a single-speed electron component, and then integrate over a thermal distribution 
of electron velocities to obtain the thermal bremsstrahlung emissivity. We briefly mention thermal 
bremsstrahlung absorption and the emission from a plasma with relativistic electron velocities. 


12.1 Radiation from a Single Collision 


The geometry for the collision of an electron with an ion of charge +Ze is shown in figure [12TT] 
The energy radiated into a range of directions dfl around r is, as usual, 


with Poynting flux 


S(t) = 


B (t) = 


Integrating over directions gives 


1 


by Parseval’s theorem, or 


dfl 

4-7T 


= r 2 dfl j dtS(t) 

(12.1) 

cE 2 {t ) cB 2 (t) 

47T 47T 

(12.2) 

I is 


r x d(f — r/c) 
c 2 r 

(12.3) 


(12.4) 

. ^ 
one 6 

(12.5) 
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Figure 12.1: Geometry of an electron-ion collision. See text. 


The collision velocity and impact parameter define a characteristic time t* = b/v for changes 
in the electron velocity and a corresponding characteristic frequency w* ~ v/b at which we expect 
most of the radiation to be emitted. The 2nd derivative of the dipole moment is d = — ev = — ea, 
and the acceleration is an impulse of strength a max = Ze 2 jb and duration ~ t*. For u> <C w* then 
we have J dt ve* a,t ~ f dt v = Av. It is not difficult to show that, for small deflection angles, the 
net impulse is 


Av = 


2Ze 2 

mbv 


( 12 . 6 ) 


and we find 


dW{b) 

dio 


SZ 2 e 6 

37 rc 3 m 2 v 2 b 2 

0 


for 


u> <C w* = v/b 
lo w* = v/b 


(12.7) 


Thus, for given v and b the spectrum of the radiation is flat, with strength oc l/v 2 b 2 up to the 
cut off a;*. 

The total energy emitted is dW ~ iv+dW/dw and is inversely proportional to the velocity. This is 
because the slower particles are accelerated for a longer time. The more rapid collisions then spread 
the energy over a greater bandwidth resulting in dW/dio oc 1/v 2 . 


12.2 Photon Discreteness 


Classically, the total energy emitted in a collision is dW ~ e 2 u 2 t+/c 3 ~ e 6 /m 2 c 3 b 3 v and emerges at 
frequency u> ~ v/b. It is interesting to compare this to the energy of a quantum of this frequency: 


dW 

hv 



( 12 . 8 ) 


The first factor here is the cube of the fine structure constant and is on the order of 10 6 , and the 
second factor can be at most of order unity for any physically allowed reaction, so this number is 
always very small. 

The energy emitted according to the classical dipole calculation is therefore less than the typical 
energy of any photons which are actually emitted by a huge factor. The classical calculation however 
correctly gives the mean energy emitted in a collision. Evidently, the rate of collision events that 
actually generate a photon is much less than the classical frequency of collisions. 
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12.3 Single-Speed Electron Stream 


For a single electron with velocity v passing through a cloud of static ion targets with space density 
rii, the rate of collisions with impact parameter in the range b to b + db is dN/dt = 2'KriiVbdb. 

For a stream of electrons with space density n e then the rate of collisions per unit volume with 
impact parameter in this range is dN/dVdt = 2iTriin e vbdb. Integrating over impact parameter gives 
the energy radiated per unit bandwidth per unit volume per unit time of 


dW 

duidV dt 


2'Kriin e v 


dbb 


dW(b) 

duj 


16 Z 2 e 6 
3c 3 m 2 v 


In (b 

max/^min) ■ 


(12.9) 


The upper limit b max arises from the requirement u> < lo*, or 


b £ b max (u,v) = v/oj. 


( 12 . 10 ) 


What sets the lower limit 6 m i n ? At low energies it is the condition that the scattering angle 
be small, but for energies i> lOeV the lower limit is set by quantum mechanics: an electron with 
velocity v has a de Broglie wavelength A<jb ~ h/mv and it makes no sense to treat the electron as 
point-like for impact parameters less than AdB> and it is reasonable to adopt a cut-off 


b ftnin (c ) 

mv 


( 12 . 11 ) 


Equation (12.91 together with (12.10 12.111 suggests that the radiated power spectrum is only 


weakly (logarithmically) frequency dependent. This is true at sufficiently low frequencies, but note 
that 6 max decreases with increasing frequency, while b m - m depends only on the electron velocity. For 
a given electron velocity, there is a cut-off in the power spectrum at frequency to where b max (cv, v) = 
b m in{v), or h/mv = v/ui, or equivalently 


W Wmax(v) ~ mV 2 jh. 


( 12 . 12 ) 


This is very reasonable, since it states that the energy of the emitted photon had better be smaller 
than the total kinetic energy of the electron. 

The above argument gives a useful approximation to the radiated power 


dW 

diodV dt 


3 c 3 m 2 v 

o 


for 


u) <C mv 2 /h 
lo 3> mv 1 jh 


(12.13) 


It is conventional to encapsulate the details of a proper calculation in the ‘Gaunt-factor’ and 
write the power as 


dW 

dudV dt 


l6n e riiZ 2 e 6 . . 

—- gs(v, w) 

3\/3 c 3 m 2 v 


(12.14) 


see e.g. R+L for more details. 

Note that the low-frequency asymptotic power scales inversely with velocity. This is because 
while fast and slow electrons spend the same fraction of time being accelerated, the faster ones 
spread their radiation over a greater bandwidth. 


12.4 Thermal Bremsstrahlung 

Having computed the radiated power for a stream of electrons with a single velocity all that remains 
to compute the power radiated by an thermally equilibrated plasma is to average over a thermal 
(Maxwellian) distribution of velocities: 


d 3 p(v) = d 3 vexp(—mv 2 /2kT) oc v 2 exp(— mv 2 /2kT)dv 


(12.15) 
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Figure 12.2: Spectra for thermal bremsstrahlung for two different temperatures (though assuming 
the same density). 


To a rough approximation we can account for the cut-off by inserting the low-frequency asymptotic 
form for the single speed dW/dwdVdt but limit the integration to v > u m i n (w) such that mt^ in /2 = 
Tiuj to give 


J dv ve ~ mv2 / 2kT 

dW ~ 16 n e mZ 2 e 6 VlDia _ Z 2 n e me 6 1/2 _ Ru/kT 

dudVdt 3 c 3 m 2 J dv v 2 e~ mv2 / 2kT (me 2 ) 3 / 2 


(12.16) 


For low frequencies the emission scales inversely with square root temperature, consistent with the 
1 /v scaling above since the typical thermal velocity scales as VT. 

Thermal bremsstrahlung spectra are sketched in figure |12.2| 

Integrating this over frequency gives 

-^737 - Z 2 n e nie G Vkf{rnc 2 )~ 3/2 (12.17) 

dv at 


so the bolometric emission scales as ypT. 

The emission scales as the square of the density. 

The free-free emissivity for a plasma is then 

e ff = = 1.4 x 10~ 27 (T/K) 1 / 2 Z 2 n e nig (12.18) 

dtdv 

in cgs units. The appropriately averaged Gaunt factor here is very close to unity for realistic 
conditions. 

This ‘thermal-bremsstrahlung’ spectrum is valid for sufficiently high temperatures, ie higher than 
the temperature corresponding to atomic transitions for the ions in question. At lower temperatures 
line emission becomes important. 


12.5 Thermal Bremsstrahlung Absorption 

The inverse reaction to that considered above results in thermal bremsstrahlung absorption, which 
can be obtained from the emissivity using Kirchoff’s law: a„ = j v lB v (T). The bremsstrahlung 
emissivity is asymptotically flat at low frequencies, whereas B v oc v 2 , so the absorption is strongly 
frequency dependent q„ oc 1/V 2 and is therefore most effective at low frequencies. 
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12.6 Relativistic Bremsstrahlung 

We now elucidate the general features of free-free emission from a plasma where the electron velocities 
are relativistic. 

First, let’s review the steps leading to the non-relativistic result. 

• 1 electron — 1 ion. In a collision, the electron sees an impulse with tt ~ Ze 2 /mb 2 lasting 
a time t* ~ b/v resulting in the emission of mean amount of energy dW ~ e 2 ii 2 f*/c 3 ~ 
Z 2 e e /m 2 c 3 b 3 v with frequency lo ~ v/b. 

• Energy cut-off. The energy of the emitted quanta should not exceed kinetic energy of 
electron, so hv <, mv 2 or equivalently b > b min = h/mv. 

• 1 electron — many ions. The frequency of collisions with impact parameter <6 is dN/dt ~ 
riivb 2 oc 5 2 whereas energy dW oc 1/b 3 , so most energy is emitted in collisions with b ~ 5 m j n . 
The rate of such collisions is dN/dt ~ UjU 6 2 ljn and the power output of a single electron is 
therefore dW/dt ~ riivZ 2 e 6 /me 3 h. 

• Many electrons — many ions. The power per unit volume is dW/dVdt = n e dW/dt ~ 
n e riiV Z 2 e 6 / me 3 h. 

We now generalize this argument to the relativistic case. The approach is to compute the emission 
in the rest frame of the electron — which sees the highly foreshortened and amplified electric field 
of a relativistic ion — and then transform back to the observer frame. 

• 1 electron — 1 ion. The electron sees the electric field of a rapidly moving ion as an impulse 
with E ~ "/Ze 2 /b 2 , so ii ~ 7 Ze 2 /mb 2 , and lasting a time t* ~ b/jc. The net energy radiated 
is therefore dW ~ e 2 w 2 f*/c 3 ~ 7 Z 2 e 6 /m 2 c 4 b 3 with frequency w* ~ 7 c/b. 

• Energy cut-off. The energy of the emitted quanta in the observer frame should not exceed 
the electron energy ~ 7 me 2 , or, in the electron frame, hv me 2 . This gives 5 m j n ~ 7 h/mc. 


• 1 electron — many ions. The electrons see the volume occupied by the ions foreshortened, 
and therefore they see an oncoming stream of ions with v ~ c and density qn* (where rq 
is the ion density in the observer frame or in rest frame of the ions). The rate of collisions 
with b ~ b m is dN/dt ~ 7 tiib/^c and therefore the power output in the electron frame is 
dW/dt ~ 7 mZ 2 e 6 /me 2 h. 

• Many electrons — many ions. The 1-electron power dW/dt is Lorentz invariant, and 
electrons have space-density n e , so the net power per unit volume is 


dW jriin e e 6 riin e e 6 E 
dV dt mc 2 h h(mc 2 ) 2 


(12.19) 


with most of the energy emitted at frequency Tiu) ~ ymc 2 = E. 

A characteristic property of relativistic Bremsstrahlung is that the emissivity is proportional to 
the electron energy, and therefore to the temperature T for thermalized plasma, as compared to 
emissivity oc ypT in the non-relativistic case. 

There is an interesting and illuminating alternative way to look at bremsstrahlung. We saw that 
the mean energy radiated in a collision (in the electron rest frame) is dW' ~ 7 Z 2 e e /(m 2 c 4 b 3 ) and 
that the energy in the observer frame is therefore dW = 'ydW 1 . Now we can write this as 



( 12 . 20 ) 


but we recognize the first factor in parentheses as the square of the ion’s coulomb field (and therefore 
on the order of the energy density of the ion’s field at distance b) and the second factor as the square 
of the classical radius of the electron r/ or, equivalently on the order of the Thomson cross section. 
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Thus, we can write the energy radiated as 


dw ~ 7 2 ^T%eid (12.21) 

with [/field the energy density of the external field (in the rest frame of the ion, that is). Crudely 
speaking, we can describe the bremsstrahlung process as though the electron, passing close to the 
ion, knocks out the energy density in a volume baT and in the process boosts this by a factor y 2 . 

As we shall see, one can express the radiation power from synchrotron and from Compton scat¬ 
tering in exactly the same way; it is as though the electron has size <jt and impacts the ambient field 
be it the field of an ion for bremsstrahlung, a magnetic field in the case of synchrotron emission, 
or a randomly fluctuating ambient radiation field in the case of Comptonization — and ejects it 
with a y 2 energy boost. As we shall see when we discuss Compton scattering, this y 2 boost factor 
is simply the result of a pair of Lorentz transforms. 


12.7 Applications of Thermal Bremsstrahlung 

12.7.1 Low Frequency Emission from Ionized Gas Clouds 

Bremsstrahlung is important at low frequencies since the spectrum is flat, with j v ~ ?r 2 T _1,/2 . As dis¬ 
cussed, at sufficiently low frequency the clouds may become optically thick to thermal bremsstrahlung 
absorption. The signature of such clouds is a spectrum with /„ oc v 2 at very low frequencies flattening 
to /„ oc u° at higher frequencies. This effect is seen in radio observations of HII regions. 

12.7.2 Clusters of Galaxies 

• Diffuse X-ray emission from very massive clusters of galaxies looks like thermal bremsstrahlung 
with kT ~ lOkeV. Lower mass clusters have bremsstrahlung-like continuum with iron lines 
superposed. 

• The inferred temperature of ~ 10 S K is consistent with hot gas in hydrostatic equilibrium in 
the same potential well depth as inferred from the virial theorem and the observed velocity 
dispersion of a v ~ lOOOkm/s. For these values, the kinetic energy per unit mass is the same 
for a galaxy as for an electron-ion pair. 

• Since emissivity scales as n 2 the surface brightness is the integral of n 2 along the line of 
sight / oc f dl n 2 . This is known as the emission measure. This tends to be very centrally 
concentrated (as compared to the projected density of galaxies for instance); X-ray observations 
have become the preferred method for detecting distant clusters where optical searches become 
subject to confusion. 

• Bremsstrahlung emission can become an effective cooling mechanism in centrally concentrated 
clusters. The net rate of energy loss is dE/dt oc E x l 2 n so the cooling time is f coo i ~ E/E oc 
E 1 / 2 n~ 1 . This has the consequences that cooling is most effective in the center of clusters and, 
for a given density, tends to be less effective in the hotter, more massive clusters. 

• Many clusters have central cooling times on the order of the age of the Universe, and are 
reasonably well modeled as ‘cooling flows’ in which gas pressure support is removed from the 
gas at the center, and the outer atmosphere quasi-statically adjusts. However, a puzzle is 
where the gas which ‘drops out’ ends up. There are cooling flows with mass disappearing at 
the rate of several hundred solar masses per year, but this gas does not end up as luminous 
stars. 

12.7.3 Bremsstrahlung from High Energy Electrons 

Gamma ray emission is detected from our galaxy which is thought to arise from Bremsstrahlung 
from high energy electrons. 
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• The electrons are typically modeled as having a power-law distribution of energies. 

• The radiative energy is carried by photons with hu ~ E e . 

• Gamma-rays from the galaxy with energies in the range 30 — lOOMeV suggest an abundance 
of relativistic electrons with 7 ~ 100 . 


12.8 Problems 

12.8.1 Bremsstrahlung 

In the following assume that the electron is non-relativistic in the ion frame but that mv 2 /2 ^ 1 
Rydberg. 

a. Using Larmor’s formula (see problem 5), give an order of magnitude estimate for the energy 
dW radiated as an electron flies past a singly charged ion with velocity v and impact parameter 

b. What is the characteristic frequency of the radiation emitted? 

b. The foregoing is valid only for sufficiently soft encounters that the photon energy satisfies 
Tilo < m e v 2 /2. Compute, for given v, the impact parameter 6 m ; n such that this condition is 
marginally satisfied. 

c. Compute the frequency of collisions (per unit volume) with b ~ b mln for a beam of electrons 
with space density n e passing through a cloud of ions with space density rij, and (combining 
this with the answer from part a) compute the power per unit volume. 

cl. Replacing the single speed v with the characteristic velocity for electrons with a thermal 
distribution of velocities at some temperature T, obtain an approximation to the thermal 
bremsstrahlung bolometric emissivity eg in terms of rii, n e , T and fundamental constants. 

e. Estimate the mean number of photons generated per collision with b ~ 6 m ; n . Express your 
answer in terms of the ‘fine structure constant’ a = q 2 /he ~ 1/137. 
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Chapter 13 

Synchrotron Radiation 


Synchrotron radiation is generated by the acceleration of electrons spiraling around static mag¬ 
netic fields. It is often called ‘non-thermal’ radiation (to distinguish it from emission from thermal 
electrons, rather than from black-body radiation). 


13.1 Equations of Motion 


The relativistic Lorentz force law gives the rate of change of the four-momentum. The time compo¬ 
nent is 


dE 

dt 


dym 

dt 


—ev ■ E = 0 


(13.1) 


so the Lorentz factor 7 , and therefore also the speed for the electron are constant in time. 
The space component is 


dP d( ymv) 

dt dt 


eB x v/c 


(13.2) 


or 


dv e 

777 , 7 —— = —r> x v 
1 dt c 


(13.3) 


Note that t here is coordinate time. 

The rate of change of V||, the component of the velocity parallel to the magnetic field, vanishes, 
so V|| =constant. 

The solution of the equations of motion for the component of the velocity perpendicular to the 
field correspond to circular motion. If the held is aligned with the 2 -axis 




r x (t) 

r y (t) 


COS Lost 

sin u>st 


v j_ = no B 


— sin cost 

COS OJst 


(13.4) 


This trajectory is a helix and the angular frequency of rotation about the held axis is 


u>b 


l v J 


eB 
7 me 


(13.5) 


which is known as the relativistic (angular) gyro-frequency. 

For low velocities v <C c, 7 —> 1 and the gyro-frequency becomes independent of the particle 
energy. The (non-relativistic) gyro-frequency is 


LOG = 


eB 

me 


(13.6) 
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13.2 Total Power Radiated 

The power radiated is, according to Larmor, proportional to the square of the proper acceleration 
P = 2e 2 Og/3c 3 , or, in terms of the coordinate acceleration in the observer frame 


2 e 2 


4 2 


2e 2 4 f eB \ 


P = = VA'y' \ 

3c J 3c 3 \7771c 


) 




or, in terms of the classical radius of the electron r o = q 2 /me 2 , 

P = \rlcpWB 2 . 


(13.7) 


(13.8) 


An alternative path to this result is to transform the magnetic field into the instantaneous frame 
of rest of the electron. The electron sees an electric field and the power can be computed much as 
for Thomson scattering. 

This is power in the rest-frame, but since dipole radiation is front-back symmetric, the radiated 
power is Lorentz invariant. 

Equation (13.81 applies for a electrons of fixed ‘pitch angle’, defined such that sin a = v±/v. 
Averaging (13.8) over pitch angle assuming an isotropic distribution gives 


(P) = ( | ) r 2 c/3 2 7 2 B 2 . 


(13.9) 


One can also express the power as 


P = 2a T cU mag f3 7 2 sin 2 a. 


(13.10) 


where U mag = B 2 /8tt is the magnetic field energy, and or = 87rrg/3 is the Thomson cross-section. 
Averaging over pitch angle and taking the highly relativistic limit, 


(P) = ^c(j T l 2 U ma , g . 


(13.11) 


13.3 Synchrotron Cooling 


The energy loss rate for a relativistic electron is dE/dt = P oc E 2 . 
E/E and a corresponding cooling time t c00 i = E/E. 

More precisely, 

dE _ 4fj7^t/ ma g 

If ~ 3m 2 c 3 


One can define a cooling rate 


(13.12) 


which can be integrated to give 


1 1 _ 4(T7 , t/ ma g 

Iff ~ ~Ei ~ 3m 2 c 3 


(13.13) 


This sets an upper limit to the electron energy as a function of the time since the electrons were 
injected. Even if the electrons were initially infinitely energetic they will have cooled to a finite 
temperature E max (t) = (3TO 2 c 3 /4o'7’t/ mag )t^ 1 after time t and electrons of lower initial energy will 
have E *7 E max . 

Observing electrons of a certain energy in a given magnetic field then gives an upper limit to the 
age of the electrons (ie time since they were accelerated and injected). 


13.4 Spectrum of Synchrotron Radiation 

At low energies, v <C c one can compute the radiation from the spiraling electrons using the dipole 
formula, and one finds that the radiation is emitted at the gyration frequency. 
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Figure 13.1: Because of relativistic beaming, a distant observer at R receives appreciable syn¬ 
chrotron radiation from the electron only for a small fraction (~ l/y) of its orbit. The leading edge 
of the pulse is emitted as the particle enters the active zone at the left, and the trailing edge of the 
pulse is emitted a time At' ~ 1/(7 u>b) later as the particle leaves the active zone at the right. The 
leading edge of the pulse has meanwhile propagated a distance cAt' whereas the particle has moved 
a distance vAt', so it has almost kept up with the leading edge. Consequently, the interval between 
reception of the pulse edges is on the order of At = At' x (1 — v/c) or, since At' ~ 1/(7 ujb) and 
(1 — v/c) ~ l/(27 2 ) we have At ~ l/(y 3 a j b )- Thus the frequency of the radiation detected is larger 
than the orbital frequency by a factor ~ y 3 . 


For large electron energies we expect the radiation to be strongly beamed along the direction of 
motion with opening angle for the beam Ad ~ 1 /y. An observer will then receive pulses of radiation 
of period r = 2-k/u b but of duration At <C t. In fact, the time-scale for the pulses is At ~ r/y 3 
and consequently the radiation emerges at frequency 

ui c ~ 7 3 wb. (13.14) 


This can be understood qualitatively as follows. The beaming effect means that a given observer 
will see radiation from the particle only for a small fraction ~ l/y of its orbit ie for \6\ <5 l/y in 
figure [TXT This is when it is moving almost directly towards the observer, and consequently there 
is a big Doppler effect: the emission of the leading edge of the pulse precedes the emission of the 
training edge by a coordinate time interval At' ~ l/(u) B y), but the latter event is closer to the 
observer by an amount Ar = vAt', so the interval between the reception of the leading and trailing 
edges is 


At = (1 - (3)At' ~ (1 - 0)/(u)bi) ~ 1/(wb7 3 ) 


(13.15) 


where we have used (1 — /?) ~ l/(2y 2 ) for y> 1. 

Thus, we expect to receive radiation up to frequencies at most of order w c given by (13.141, which 
is often called the critical frequency. 















160 


CHAPTER 13. SYNCHROTRON RADIATION 



Figure 13.2: Geometry for calculation of the spectrum of synchrotron radiation. An electron orbits 
in the x — y plane with velocity v. The vector R points to the distant observer. 


13.4.1 Pulse Profile 


This result can also be obtained using the Lienard-Wiechart potentials, and this also allows us to 
infer the shape of the spectrum of synchrotron radiation. 

For an observer at great distances the magnetic potential is 


q 

V 

q 

v(t') 

C 

(1 — n • v/c) R + r 

ret _ CR 

1 — n • v(t')/c_ 


(13.16) 


where r(t') is the trajectory of the electron, v = r, n = R and where the retarded time if is the 
solution of 


if = t — R/c + n • r (t')/c 


(13.17) 


where R is now considered constant. 

For the geometry shown in figure 13.2 we have r = r(cos ust/ sin 0) and n = (0, cos 9, sin 9) 
so n • r(t') = r cos 0 sin wst'- The relation between observer time t and retarded time t! is then 


t — R/c = tf — 


— cos 9 sin cost'■ 
UJb 


This function is plotted in figure |13.3| 

The ‘Doppler factor’ appearing in the LW potentials is 


k = 1 — R • v = 


dt 
dt' 


1 — \J 1 — (l/l) 2 COS0COS UlBt' 


(13.18) 


(13.19) 


where we have used j3 = y/l — (I/7) 2 . Evidently, the factor k becomes very small — and conse¬ 
quently the potential becomes very large - - provided y/l — I/7 2 , cos 9 and cos are all very close 
to unity. Equivalently, a small n requires that I/7, 9 and cost 1 are all very small compared to unity. 
If so, 


k — 2 [(1/7) 2 + o 2 + 


( 13 . 20 ) 
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t = t' — 1 6 , cos(0)sin(cJ B t’)/aJ B 





3 



Figure 13.3: Observer time as a function of retarded times as given by (13.181. Two complete 
rotations are shown, for various values of /?cos 9. At high velocities and small angles, observer time 
becomes almost stationary with respect to retarded time. Consequently, the function k becomes 
very small, and the potentials and fields become very strong, resulting in a short pulse of radiation. 
It appears that the function tends to a well defined limit as /3 cos 9 —> 1. However, this is somewhat 
misleading; if we examine in detail the nearly stationary region, we find that the behaviour is very 
sensitive to how close (3 cos 9 is to unity. 


where we have expanded the trigonometric functions in (13.19). 
Performing the analogous expansion on (13.181 gives 


t — R/c 


\{T ~ 2 + o 2 )t' 



1 \ 

3 7 - 2 + 0 2 J ' 


from which we see that there is a characteristic time-scale 

i*( 7,8) = \Ar 2 + d 2 /u B . 

For t' <C f* we have a linear relation t oc t' whereas for t' t oc t' 3 . More specifically, 


(13.21) 


(13.22) 


/ ( 7 - 2 + 0 2 ) f ' 

\ c4t' 3 t » 


(13.23) 


The time-scale corresponds to a characteristic observer time interval t* = (7 2 + 0 2 ) 3 / 2 /lob- 
Note that for 6 = 0 (or, more generally, 9 <, l/ 7 ) this just the inverse of the critical frequency: 
I* — l/^C- 
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Figure 13.4: Relation between retarded time t' and observer time t. 


We can turn (13.231 around to obtain the asymptotic form for the retarded time as a function 
of observer time: 


t' ~ 


t/( 7 
(t/u 


+ e 2 ) 
B ) 1/3 


t <C U 
t > U 


(13.24) 


as as sketched in figure |13.4 


We can now compute, for example, the x-component of the potential A x . The numerator in 


(13.161 then contains the cc-component of the velocity v x = usin lobI' — vlobI' since we can safely 
assume that t' <C 1 / u > b - The potential is therefore given by 


A x (t ) 


qvuist' 2 

cR (I/ 7) 2 + 9 2 + u>%t 12 


with t' given by (13.211. This has the asymptotic behaviour 


^ qv_ j 2 + 9 2 ) 2 t < f* 

~ cR X \ 2 (u) B t)~ 1/3 t » t* 


(13.25) 


(13.26) 


as sketched in figure |13.5| 

The magnetic field is B = V x A. At large distances, and for our geometry, this gives B z = 
( l/c)dA x /dt. The field is a time symmetric pulse with 


~ / 2wb /(t 2 + 02 ) 2 * < i* 

~ c 2 i? x 1 -(2/3)w)) 1/3 r 4 / 3 t > t* 


(13.27) 


The form of the field is shown in figure |13.6[ The field in the negative ‘wings’ falls off rapidly, and 
the contribution to the net energy flux from t t+ is small. The characteristic frequency in the 
spectrum of a single pulse from an electron of Lorentz factor 7 as viewed by an observer along a 
direction at angle 9 out of the orbital plane is therefore 


_ “ (y- 2 + 0)3/2 ' 


(13.28) 


Since there are no features in the potential or the field on scales smaller than f* the power falls 
rapidly for u> 1 /i*. 
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Figure 13.5: Sketch of the form of the potential for a pulse of synchrotron radiation from a highly 
relativistic electron with Lorentz factor 7 as seen by an observer lying an angle 9 out of the plane 
of the orbit. 



Figure 13.6: Schematic profile of the magnetic (or electric) field for a pulse of synchrotron radiation. 

13.4.2 Low-Frequency Power Spectrum 

While there is relatively little power at u <C w* it is still of some interest to obtain the form of the 
power spectrum. The low-frequency power spectrum is dominated by the wings in the pulse. We 
can model these as 

B(t,9) -^^LU~ B 1/3 t~ 4:/3 for t ^ f* (13.29) 

where f* ~ 9 3 /to b • The transform of the field is then 

B u {0) = / dt B(t, 9)e iult ~ -f- — for to ^ (13.30) 

J c z R \to B J 
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Figure 13.7: Sketch of the form of the power spectrum of synchrotron radiation for mono-energetic 
electrons. 


with rapid attenuation at higher frequencies to w*. We have used here J dt f _ 4 / 3 e*“* = to 1 / 3 f dy y -4 / 3 e 
where the latter integral is some number of order unity. 

It would appear then that the power spectrum — which is proportional to B 2 — is P(to) oc to 2 / 3 . 

This is true for an observer at a specific angle relative to the orbit plane. However, in general, we 
have a distribution of pitch angles, so we really want to integrate over the distribution. Equivalently, 
we want to integrate over possible angles for the observer. Then we need to allow for the fact that 
the cut-off frequency w* is angle dependent. As we will now show, this results in more observed 
power at low frequencies. 

The total energy emitted in a single gyration is obtained by integrating the Poynting flux: 


A W = R 2 


d6 


dt 


cB 2 (t, 9) 
4-7T 


cR 2 
87T 2 


dto / dd B%(0) 


(13.31) 


by Parseval’s theorem. The cut-off frequency is w* = wb/( 1 / 7 2 + 0 2 ) 3 / 2 . At some observed frequency 
to, an observer will see appreciable radiation only if the angle 9 is less than some 0 max such that 
w*(0 max ) = to. If to -C to c this means that 9 max (co) ~ (ub/w) 1 / 3 . At larger angles, w* < to and the 
radiation is suppressed. To the level of sophistication that we are working, we can then replace the 
/ d9 B 2 (9) by 0 max times the asymptotic expression (13.301, to give 


AW ~ 


dto 


q 2 v 2 

c 3 R 2 



(13.32) 


The total power is P = oobAW/2tt, since the pulses occur with frequency u>b/ 27t, so we have 
P = f dto P(to) where the power spectrum is 


p M 


(zVcb^/V/ 3 


(13.33) 


The low-frequency power-spectrum is therefore a power-law: P{to) oc w 1 / 3 . The general form for 
the synchrotron power spectrum for electrons of a single energy is sketched in figure |13.7| 

Note that u>b = qB/'ymc = qBc/E. Thus a a highly relativistic electron will have the same 
low-frequency power as a highly relativistic proton of the same frequency; the low-frequency power 
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depends only on the energy of the particle, and not its velocity (assuming 7 1 at least). The 

high-frequency cut-off at w* = 7 3 uib — E 2 eB/m 3 c 5 on the other hand, depends also on the mass 
of the particle. This is because the cut-off, unlike the low-cu power, is critically dependent on how 
close the particle velocity is to the speed of light. 

Finally, a puzzle: Compare the spectrum obtained here with the low-energy spectrum from 
bremsstrahlung. In that case we argued that a collision will produce a pulse of radiation, and so 
the low-frequency spectrum should be flat, P(lo) oc oj°. Here we also have a pulse of radiation. 
Why do we not find a flat, low-cu power spectrum? The answer lies in the shape of the pulse. For 
bremsstrahlung, the time integral of the field f dt B(t) is non-zero, and the transform of the field at 
low frequencies is constant. Here the integral of the pulse vanishes: f dt B(t) = 0 and there is no 
analogous flat-spectrum component. 

13.5 Power-Law Electrons 

To obtain a realistic synchrotron spectrum we need to convolve the mono-energetic electron spectrum 
derived above with the energy distribution function for the electrons. 

An interesting model is where the electrons have a power law distribution in energy: 

n{y)dy = Cy~ p dy. (13.34) 

The synchrotron power spectrum will then be a superposition of copies of the spectrum for mono- 
energetic electrons derived above, with appropriate scaling of amplitude and frequency. 

To find the form of the composite power spectrum, we can argue as follows: The number of 
electrons in a logarithmic interval of 7 is 


dn ~ C 7 1 p d In 7 (13.35) 

and the power radiated by a single electron is P ~ 7 2 cctt-/V 2 oc y 2 and appears ar frequency 
u> ~ 7 2 7 g- This means that frequency u> corresponds to energy 7 = Ju>/u>g and therefor the power 
radiated by the electrons with energy ~ 7 is 

dP = ojP(oj)dlnu> oc 7 3 -p dln 7 oc w (3-p ^ 2 dlnw (13.36) 

It then follows that the composite power spectrum is a power-law 

-Ptot(w) oc w~ s (13.37) 


with spectral index s = (p — l)/2. 

Such models give a reasonable description of emission from radio-galaxies, which typically have 
power-law-like spectra extending over a substantial range of frequencies with spectral index s ~ 0.8. 

Radio galaxies also often display a cut-off at low frequencies due to synchrotron self-absorption. 
See R+L for further discussion. 
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Chapter 14 

Compton Scattering 


Compton scattering is the generalization of Thomson scattering to allow for the recoil effect if the 
photon energy is not completely negligible compared to the electron rest mass. 

For Thomson scattering we found that the scattered radiation had the same frequency as the 
incident radiation, so the energy of the quanta are unchanged, and we obtained the differential 
cross-section 

^ = ir3(l + c« 2 e)=^(l+c« 2 0) (14.1) 

where 9 is the angle between the directions of the incoming and scattered photons. 

The generalization of this involves two modifications 

• Recoil of the electron — this can be understood from simple relativistic kinematics. 

• The cross-section is modified (the Klein-Nishina formula) if the photon energy in the rest frame 
of the electron exceeds the electron rest mass energy. This requires quantum electrodynamics. 


14.1 Kinematics of Compton Scattering 

Suitable null 4-vectors to represent the initial and final photon 4-momenta are 



■ i' 


1 

e 

i 

and P~t — — 

cos 9 

— 

0 

0 

c 

7/ c 

sin 6 cos (f> 
sin 9 sin <j> 


(14.2) 


where e denotes the energy, the subscript 1 denotes the outgoing photon state, (ie after one scattering) 
and we have chosen the initial photon have momentum parallel to the a;-axis. 

Similarly, the 4-momenta for the initial and final electron states are 



me 


' E/c ' 

P ■ — 

J ei — 

0 

and P'yf = 

R. 

0 

1 X 

Py 


0 


P z 


(14.3) 


where we are working in the rest-frame of the initial electron. These 4-momenta are illustrated in 
figure |14.1 

Conservation of the total 4-momentum is 


P~/i + Pei ~ P-yf + Pef- (14.4) 

If we specify the incoming momenta P e i and P 7 ^ then the outgoing 4-momenta contain six free 
parameters, ei, 9 and for the photon and P e / for the electron (with the electron energy then fixed 


167 













168 


CHAPTER 14. COMPTON SCATTERING 


before 


6 

C 


( 1 > n ) 


r \y^N/\y -m 


( me , 0 ) 


after 



(E f /c, P f ) 


Figure 14.1: Four-momenta of particles involved in a Compton scattering event, working in a frame 
such that the electron is initially at rest and the initial photon direction is n = (1,0,0). The final 
photon direction is ni = (cos 0, sin 0 cos <j>, sin 9 sin </>). 


by the mass-shell condition E 2 = p 2 c 2 + to 2 c 4 ). If we specify the direction 0, <f> of the outgoing 
photon say, then equation (14.41 provides us with the necessary four constraints to fully determine 
the collision (ie the energy of the photon and the 3-momentum of the outgoing electron). 

If we simply want to determine the energy of the outgoing photon ei, then we only need one 
equation. A convenient way to throw out the unwanted information P e f is to take the norm of P e f. 
If we orient our spatial coordinate system <f> = 0, so the outgoing photon momentum lies in the x — y 
plane, then P e f is 

e + me 2 — ei 
1 e — ei cos 0 
ei sin 0 


Pef = P- 




Pei — Pjf = ~ 

C 


0 


(14.5) 


and the mass-shell requirement E 2 j = c 2 |P e /| 2 + to 2 c 4 becomes 

(e + me 2 — ei) 2 = (e — ei cos 0) 2 + (ei sin 0) 2 + to 2 c 4 . 


(14.6) 


Which is a single equation one can solve for ei given e and 0. Expanding out the products and 
reordering gives 

(14.7) 


£l = 


!+ ^2( 1 -cos0) 

and expressing the photon energies in terms of wavelength e = hv = he /A gives 

Ai — A = A c (l — cos 0) 

where the parameter 


A, = 


h 


(14.8) 


(14.9) 


is the Compton wavelength. 

Equations ( 14.7|14.8 ) describe the energy loss for photons scattering off stationary electrons. 
They show that the collision is effectively elastic (ie t\ ~ e) if e <C me 2 . 
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14.2 Inverse Compton Effect 

Also of interest is the change in energy of photons scattering off moving electrons. This can be 
obtained by a) Lorentz transforming the photon 4-momentum to obtain its value in the electron 
rest frame b) computing the energy change of the photon as above and c) Lorentz transforming the 
outgoing photon 4-momentum back to the ‘laboratory’ or observer frame. 

Let us take the electron to be moving in the x-direction and take the initial photon 4-momentum 
to be P 7 j = e(l, cos 9, sin 9, 0) then transforming to the rest frame (primed quantities) gives 


e' 


7 -/?7 


e 


57(1 — p cos 9) 

e' cos 9' 


—/?7 7 


ecos 9 


£ 7(/3 — cos 9) 

e' sin 9' 


1 


esin 0 


e sin 9 

0 


1 


0 


0 


so the rest frame incoming photon energy is 

e = ey(l — (3cos0). (14.11) 

This photon will scatter to an outgoing photon with energy e[ ~ e' provided e' <C me 2 , and with 
some direction 9[, <p\ . Applying the inverse Lorentz boost to get back to the lab-frame gives 

ei = e , 1 7 (l + (3 cos 6[) (14.12) 

and so the initial and final energies in the lab-frame are related by 

ei = ey 2 (l + /3cos9[)(l — /3cos 9). (14.13) 


The reason for writing the energy change in this seemingly awkward way — with one angle 
in the rest-frame system and one in the lab-frame system — is that both 9 and 9[ have a broad 
distributions. The angle 9 is the distribution of incoming angles in the lab-frame, and the distribtuion 
of /i = cos (9) is flat. The angle 9[ is the direction of the scattered photon in the electron-frame. 
This is not isotropic, but still has a broad distribution. In contrast, for high energy electrons, both 
9' and 9\ have very narrow distributions as illustrated in figure 14.2 


It then follows that for typical collisions, the photon energy is boosted by a factor ~ 7 


2 . 


ei 


7 2 e. 


(14.14) 


The only exceptions to this rule are for incoming photons with 9 ~ 0 (ie propagating in the same 
direction as the electron) or if the outgoing photon has 9[ ~ n (ie the scattered photon direction is 
opposite to the electron velocity), but these are special cases. 

This result can also be understood in terms of ‘beaming’ (see figure [l4~2| . Consider some isotropic 
or nearly isotropic photon gas and a rapidly moving electron with v = j3~k. Boosting into the 
electron frame we find that the electron sees a highly anisotropic radiation field, with most of the 
photons having momenta parallel to the direction —x. These photons get scattered approximately 
isotropically in the electron frame, and boosting back to the laboratory frame we find that the 
outgoing photons are tightly beamed along the +x direction. 

This process is an extremely efficient way to boost low energy photons to high energies, and 
(since the phrase ‘Compton scattering’ was initially used to describe the loss of photon energy in 
colliding off cold electrons) is called inverse Compton scattering. 

The results above are valid only if e' <C me 2 , which means it is restricted to -C ymc 2 . 


14.3 Inverse Compton Power 

Imagine a cloud containing hot electrons and radiation being scattered. What is the rate at which 
energy is given to the radiation field by inverse compton scattering off the rapidly moving electrons? 













170 


CHAPTER 14. COMPTON SCATTERING 



outgoing 



7 e o 



lab frame 


e ~ 7 eo 




Figure 14.2: The y 2 energy boost factor can be understood in terms of relativistic beaming. Upper 
left shows an electron moving with velocity v in the lab frame and incoming photons which are 
isotropically distributed. Upper right panel shows the incoming quanta in the rest-frame of the 
electron. They are now highly anisotropic, so the electron sees them as nearly ‘head-on’ and their 
typical energies are boosted by a factor ~ 7 . Lower left shows the photons after scattering. They 
are now approximately isotropic in the electron frame, and have roughly the same energy as they 
had before being scattered. Lower right panel shows the scattered photons in the lab-frame. They 
are now again highly collimated, and their typical energies have been boosted by a further factor 
~ 7 , so in the lab-frame the overall typical energy is boosted by a factor y 2 . 


To answer this we transform to get the energy density u' in the electron rest frame and then use 
the Thomson cross-section to give the power scattered as P' = cctt'u' , which is Lorentz invariant 
since the scattered radiation is front-back symmetric in the electron frame, and which therefore also 
gives the rate at which energy is scattered into the radiation field in the lab-frame. 

To make the transformation, recall that one can write the energy density for radiation in range 
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of frequencies du; and range of directions dfl in terms of the phase-space density /(r, p) as 

u v {fl)dvdfl = f(r, p)Ed 3 p = E 2 f( r,p)^ = h 2 v 2 f(r, P)^ (14.15) 

with P = /uq p = hv/c and d 3 p = p 2 dpdfl = (h/ c) 3 v 2 dvdfl. But /(r, p) (which here is actually 
independent of r) and d 3 p/E are both Lorentz invariant quantities, so we have 

u' v , (fl')dv'dfl' = (v 1 /v) 2 u„(fl)dvdfl. (14.16) 

The reason for two powers of v'/v here is that the energy E = hv of each photon gets boosted 
by one power of v'/v and the rate at which photons pass by an observer also increases by the same 
factor (if photons are passing at a rate of e photons per period of the radiation in one frame then 
they are passing by at the rate of e photons per period in all frames). 

Now the frequency factor is simply given by the Doppler formula v'/v = 7(1 — /3cosd), which 
depends only on the speed of the electron and the direction 6 of the photon in the lab-frame, so we 
have for the total energy density in the electron frame 

dfl' J dv' u' v ,{fl') = 7 2 J dll (1 — /3cos0 ) 2 J dv u„{fl) (14.17) 



and if the radiation is isotropic in the lab-frame then 

u = y 2 u J — (1 — /3cos0 ) 2 = 7 2 u (1 + /3 2 /3). 


(14.18) 


Using q 2 = 1/(1 — P 2 ) we can write this as v! = ( 4 / 3 )u( 7 2 — 1/4) so the rate at which energy is 
scattered into the radiation field is 


P+ = P' + = ccjtu' 


-c<j t u( 7 2 


1/4). 


(14.19) 


We also need to take into account the rate at which energy is being scattered out of the radiation 
field by these collisions, which we will denote by P_. In the electron frame this is just equal to P + , 
but this is not very useful since the incoming radiation is highly beamed in the electron frame and 
so P- is not Lorentz invariant. 

To compute P_, consider instead of the energy density w„(fl), the photon number density 
n„( fl) = u„(fl)//u 7 and which, according to (14. 16[) , transforms as 


n'vPOpdv'dfl' = (v'/v)n v (0)dvdO. 


(14.20) 


The flux of photons in the electron-frame (number per area per second) in a range of frequency 
and direction dv'dW is cn'^popdv'dO' and the rate at which these are scattered out of the beam is 
just ctJTn' v , (O')dv'dO' so the total rate of scattering events is 


dN 

dt' 


cut 


dfl' 


dv' n' v ,(fl') = cctt! 


dfl (1 — p cos 6) 


dv riv(fi) 


(14.21) 


which, for isotropic incident radiation, is 


dN 

dt' 


Vkcot^ 


dv n u (fl) = corTn. 


(14.22) 


where n is the total number density of photons in the lab-frame. 

The proper time interval dt' is related to lab-frame time interval dt by dt = ydt' because of 
time-dilation, and therefore the rate of scatterings as measured in the lab frame is just 


dN 

dt 


1 dN 
7 dt' 


= earn. 


(14.23) 
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Now Thomson scattering is independent of the frequency of the photon, so it follows that the rate 
at which energy is scattered out of the radiation field is given by multiplying dN/dt by the mean 
energy per photon, to give 

dN 

P- = (hv)—— = earn (14.24) 

dt 

Note that this is precisely the rate at which energy is scattered for a stationary electron. This 


is something of a coincidence since we can see from (14.211 that the photons removed from the 


radiation field by the moving electron do not have an isotropic distribution in the lab-frame, rather 
they have a (1 — (3 cos 9) distribution, so the photons propagating in the direction opposite to the 
electron are more likely to be scattered. 

Combining P + from ( |14. 19 ) and P__ from (14.241 gives the net inverse Compton power for 1 
electron of P = P+ — P- or 

P = 4 3 2r y 2 coTU (14.25) 

o 

and the total energy transfer rate per unit volume is given by multiplying this by the electron density, 
or more generally by the distribution function n(r, E ) and integrating over energy. 

Equation (14.251 is remarkably simple, and also remarkably similar to the synchrotron power 


and the bremsstrahlung power, for reasons already discussed. 

Interestingly, for low velocities, the Compton power is quadratic in the velocity. There is no 
first-order effect, since a scatterings may increase or decrease the photon energy. 


14.4 Compton vs Inverse Compton Scattering 


Equation ( |14.25| ) is supposedly valid for all electron energies, and is clearly always positive. However, 
this does not make sense. For cold electrons, Compton scattering result in a loss of energy for the 
electrons via the recoil, which was ignored in deriving (14.251. 


For low energy electrons (with v/c = (3 <C 1), the radiation in the electron frame is very nearly 
isotropic (6v/v ~ (3 <C 1, so consequently the variation of the intensity 61 /1 ~ ft 1 and is also 
small, so we can incorporate the effect of recoil by simply subtracting the mean photon energy loss 


given by (14.71. 


For v <C c mean rate of energy transfer is given by dE/dt — (4/3 )ccttuv 2 /c 2 while the rate of 
scatterings is dN/dt = corn = caru/(e) so the mean photon energy gain per collision (neglecting 
recoil) is (Ae)/(e) = (4/3)(n/c) 2 , and if v <C c this is approximately equal to the mean fractional 
energy gain (Ae/e) = (4/3)(u/c) 2 For a thermal distribution o f elec trons this becomes (Ae/e) = 
4 kT/mc 2 . The mean fractional energy loss due to recoil is from (14.71 Ae/e = e/Mc 2 , so combining 
these gives 

' Ae\ 4 kT — hio . . 

= -—■ 14-26 

mc z 

It e > 4 kT then there is net transfer of energy to the electrons and vice versa. 


14.5 The Compton //-Parameter 

The Compton y-parameter is defined as 

Ae 


y = ( — ) x (number of scatterings). 


(14.27) 


• In a system with y much less (greater) than unity the spectrum will be little (strongly) affected 
by the scattering(s). 


• In computing y it is usual to either use the non-relativistic expression (14.26) or the highly 
relativistic limit (Ae/e) ~ 47 2 /3. 


The mean number of scatterings is given by max(r, r 2 ). 
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• The y-parameter is generally frequency dependent. 


14.6 Repeated Scatterings 

14.6.1 Non-Relativistic, High Optical Depth 


To a crude but useful approximation we can say that the photon gains energy according to (14.261 
in each collision, and if kT h,v then after N scatterings we have 


( £ jv) 

eo 


= 1 + 


4 kT\ N _ 
me 2 J 


1 + 


1 4 NkTY 
N me 2 J 


N - 


72 — P y 


(14.28) 


so the mean net energy boost is the exponential of the Compton y-parameter. 


14.6.2 Highly-Relativistic, Low Optical Depth 

In the other extreme, 7 1, the mean fractional energy gain per collision is A = ei/eo — q 2 so 

after k scatterings we expect e k ~ eoA k , or equivalently, the number of scatterings required to reach 
energy e is fc ~ ln(e/eo)/ln A 

For low optical depth t< 1 the probability of fc-scatterings is p(k) ~ r k , so the intensity in 
highly boosted photons is 

I(e k ) ~ I(e 0 )r k ~ I(e 0 )e klnT ~ / 0 exp „ j 0 x ' (14.29) 

so the result is a power law /(e) oc e~ p with p = — In r/In A 


14.7 The Sunyaev-Zel’dovich Effect 

The Suny&ev-ZeVdovich effect is the effect on the microwave background radiation induced by scat¬ 
tering off hot gas. 

• Since the input spectrum is nearly thermal and therefore has high occupation number in R-J 
region, a proper treatment requires consideration of stimulated emission. For non-relativistic 
gas, the evolution of the spectrum is described by the Kompaneets equation (see R+L) which 
is a Focker-Planck equation describing the diffusion of photon energies. Here we will give only 
a qualitative discussion of the effect. 

• Photons may be scattered out of or into the line of sight, so the number of photons does not 
change. Photons may be scattered up or down in energy. On average, the increase in energy 
is Ae = 4 kTe/mc 2 . The result must be some shift of the energy distribution function n v (D) 
to generally higher energies, but which preserves the area f dv n v . This gives an increase in 
intensity at high frequencies hv i> Z’/mbr and a reduction in the R-J region. 

• It has been observed for several clusters of galaxies. 

• The R-J SZ decrement gives a fluctuation in brightness temperature A T/T = —2y. It is on 
the order of 10 -4 . 

• The SZ effect measures the integral along the line of sight of the electron density times the 
temperature J dz n e T , which is proportional to the integral of the pressure along the line of 
sight. 

• For a cluster of size R the SZ decrement is proportional to n e R whereas the X-ray emission is 
proportional to n 2 i?, so the ratio of the square of the SZ effect to the X-ray emission provides 
an estimate of the physical size of the cluster. 
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• Combining the physical size with the redshift and the angular size of the cluster provides a 
direct estimate of the scale of the universe, or equivalently of the Hubble parameter Hq . 

• There is another effect — the kinematic SZ effect — in which clusters give rise to a temperature 
fluctuation via the Doppler effect. This is on the order of the optical depth to electron scattering 
times the line of sight velocity and it typically somewhat smaller than the true SZ decrement. 


14.8 Compton Cooling and Compton Drag 


Compton scattering of microwave background photons off of moving electrons removes energy from 
electrons. Collisions with ions will replenish the electron energy, so the net result is to cool the gas. 

We can estimate the time-scale for this cooling as follows: For non-relativistic electrons, the 
electrons see radiation which is slightly anisotropic; the temperature is T(/i) = (1 + /3/T)To, where 
To is the isotropic temperature seen by a stationary observer. The intensity, which scales as the 
fourth power of temperature, is then I{p) ~ (1 + 4/3/r)/o, and computing the radiative force F = 
( (Jt/c ) / dO pl(p) gives F ~ (4/3 )uot/3. This gives an equation of motion for a single electron: 


dv 

dt 


f 4 uctt 
\3 TO e C 


V. 


(14.30) 


The radiation scattering acts like a viscous drag force, and the velocity decays as v(t) = v(t 0 ) exp(— t/r e ). 
The time-scale for the velocities to decay — which is also the time-scale for the gas too cool, provided 
collisions can replenish the electron energy sufficiently fast is 


m e c 

Te ~ - 

H<7 T 


(14.31) 


The time-scale for ‘Compton-cooling’ exceeds the age of the universe today, but was effective in 
the past. It may plays an important role in galaxy formation as cooling is necessary to allow the 
baryonic material to settle within the dark matter potential well, since otherwise the hot gas would 
simply remain as a hot atmosphere is hydrostatic equilibrium. 

Compton drag is a frictional force exerted on ionized gas which is moving relative to the mi¬ 
crowave background frame. This is highly analogous to the drag on an electron computed above. 
The time-scale is much longer, however. This is because for a blob of ionized plasma, the scattering 
cross-section is supplied almost completely by the electrons while the inertia is provided almost 
entirely by the ions. For ionized hydrogen, the time-scale for the velocity to decay is then 


po T 


(14.32) 


which is longer than r e by a factor m p /m e ~ 2000. This only becomes effective at very early 
times, but results in any ionized gas being effectively locked to the frame in which the microwave 
background appears isotropic. 


14.9 Problems 

14.9.1 Compton scattering 2 

Consider the scattering of a photon off an electron at rest. Write down suitable 4-momentum 
vectors p to describe the incoming/outgoing states (use subscripts i,o for in/out photon states), 
denote photon energies by e, photon directions by n and outgoing photon 3-mom by p. 

By considering the squared modulus of the outgoing electron 4-mom (or otherwise) show that 
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where 9 is the angle between the incoming/outgoing photon directions, and that the photon wave¬ 
lengths are related by 

A 0 = Aj + A c (l — cos 6) (14.34) 

where A c = h/mc is the Compton wavelength of the electron. 

14.9.2 Inverse Compton Effect 

Consider a ‘head on’ collision between a photon of energy e and an electron with initial Lorentz 
factor 7 (as measured in the ‘lab-frame’) and in which the photon reverses its direction of motion. 
(Assume the electron is initially moving left — negative velocity — and the photon is travelling to 
the right). 

What is the energy of the incoming photon in the electron rest frame? What is the energy of 
the outgoing photon in the electron rest frame? Neglecting the change of photon energy in the rest 
frame, compute the energy of the outgoing photon in the lab frame. Assuming 7 1, estimate the 

fractional error in the lab-frame energy change incurred in neglecting the e-frame energy change. 

Now consider a highly relativistic electron propagating through the 3K microwave background. 
What is the typical incoming photon energy in the electron rest frame? What is the limit on 7 such 
that it is valid to use the Thomson scattering cross-section ctt to estimate the rate of collisions. 

14.9.3 Compton y-parameter 

Define the ‘Compton y-parameter’. Consider a galaxy cluster containing gas at temperature 10 keV, 
scattering microwave background photons. Assuming an electron density n e ~ 10“ 3 cm~ 3 and size 
~ IMpc, estimate the optical depth r to electron scattering. 

What is the typical change in frequency for a scattered photon? 

What is the average change in frequency for a scattered photon? 

What is the y-parameter? 

What is the fractional change in intensity of the MBR in the Raleigh-Jeans region. 

Clusters are thought to move relative to the microwave background with velocities v ~ (1 — 
2) x 10 _3 c. Discuss in general terms what effect would this have on the microwave background 
temperature? How does this compare with the inverse-compton effect? How might one disentangle 
these effects? 
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Chapter 15 

Field Theory Overview 


Lagrangian dynamics is ideally suited for analyzing systems with many degrees of freedom. Such 
systems include lattices of atoms in crystals and, in the ‘continuum limit’, the behavior of continuous 
fields. For systems with potentials which are quadratic in the coordinates — which is always the case 
for small amplitude oscillations about a potential minimum — once can always find normal modes 
of oscillation. These are linear combinations of the coordinates qi in terms of which the system 
becomes a set of independent simple harmonic oscillators. For the crystal lattice, and for classical 
fields, these normal modes are just traveling waves. This means that quantum mechanically, the 
system is simply described by a set of occupation numbers or energy levels for each of the normal 
modes. 

The quantum mechanical states of the system in the occupation number representation are 


|ni,n 2 ...rij 


(15.1) 


with rij giving the occupation number of the j th normal mode. For a non-interacting field theory 
these states are exact eigenstates of the total Hamiltonian. The non-interacting or ‘free’ field is 
rather sterile; it just sits there. It is also highly idealized, in reality the system must interact with 
the outside world, and there may also be internal interactions. External interactions, such as wiggling 
an atom in a crystal or wiggling a charge in the electro-magnetic field, can add or remove energy, 
and so change occupation numbers. Similarly, any degree of anharmonicity of the oscillators will 
introduce coupling between the normal modes, so the quantum theory should allow for scattering of 


quanta. The usefulness of the multi-particle states (15.1) in the context of interacting fields is that 
for many systems the interactions can be considered a small perturbation on the non-interacting 
theory, so these are approximate eigenstates of the system and one can apply perturbation theory 
to compute transition amplitudes for the system to go from one state to another. 

This program leads to quantum field theory, encompassing quantum electrodynamics — which 
provides the proper description of the scattering processes which we have treated above in an approx¬ 
imate classical manner for the most part — and also theories of weak and strong interactions. This a 
huge and formidable subject, combining, as it does, special relativity and quantum mechanics. In the 
following three chapters I shall try to give a flavor of these theories with a simplified treatment that 
ignores many important features, but which illustrates some features which are of great relevance for 
astrophysics. The approach I shall follow is similar to that of Ziman, in his “Elements of Advanced 
Quantum Theory” in that I shall first consider a ‘solid-state’ model consisting of a lattice of coupled 
oscillators. This system can be analyzed using regular Lagrangian mechanics, and taking the con¬ 
tinuum limit this becomes a classical Lagrangian field theory for ‘scalar-elasticity’ waves. We then 
consider the quantum mechanics of this system, the quanta of which are phonons. This may seems 
out of place in a lecture course on astrophysics. Our reason for exploring this model is that if we 
‘abstract away’ the underlying physical medium and choose coefficients to make the field equations 
relativistically covariant we obtain a quantum field theory for the relativistic massive scalar field, 


whose quanta are massive spin-less bosonic particles (chapter 181. These entirely distinct systems 


are, in a sense, identical, since they have the same Lagrangian. This means that all the results for 
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the more concrete, and conceptually less challenging, atomic lattice carry over to the more abstract, 
but for us much more interesting, scalar field. 

This in itself may seem somewhat removed from the kind of interactions we have mostly been 
considering here which are between electrons (massive but fermionic in nature) and the electro¬ 
magnetic field (bosonic but massless and a vector field). One motivation for considering the scalar 
field is that it is in many ways the simplest type of matter field, and nicely illustrates many features 
of the quantized electro-magnetic field without the complications of polarization, gauge invariance 
etc. It shows how mass is introduced into the theory, and the theoretical machinery for calculating 
transition amplitudes, reaction rates etc carries over to quantum electrodynamics and then to weak- 
interaction theory and beyond. Another motivation for considering the scalar field is the important 
role that such fields play in modern cosmology. 

Here is a ‘road-map’ to the next three chapters. 

In chapter [16] we develop classical non relativistic field theory. In §16.1| we construct a simple 
mechanical system consisting of a lattice of coupled oscillators; each oscillator consists of a bead 
on a rod with a spring, and the oscillators are connected to their neighbors with coupling springs. 
This system is analyzed using regular Lagrangian dynamics, and we obtain the normal modes of 
oscillations and dispersion relation etc. for such a lattice. 

In §16.2 take the continuum limit of the Lagrangian for this lattice to obtain the Lagrangian 
density function V(j), <f>). We then show how the field equations for waves on this lattice can 
be derived from this Lagrangian density by requiring, as always, that the action S be extremized. 
In §16.3| we explore the conservation laws which arise from the invariance of the Lagrangian density 
under time and space translations. The former gives conservation of energy, just as we could have 
obtained from regular mechanics. The latter gives rise to something quite new, which is not apparent 
in the usual mechanical analysis; this is the wave-momentum which is conserved independently of 
the ‘microscopic’ momentum embodied in the canonical momenta of the system. We show that the 
energy and wave-momenta of classical wave packets are related by P/E = k/u; where k and ui are 
the spatial and temporal frequencies respectively. 

The lattice model we explore (see figure p~6~T| ) is very specific; in addition to the coupling springs, 
which allow waves to propagate along the lattice, there are internal springs. These have the conse¬ 
quence that the oscillation frequency tends to a finite lower limit as k —» 0. In fact the dispersion 
relation is 


2 ; 2 2 
ui = k c 


2 4 

me 


(15.2) 


where c is the wave-speed for high momentum waves, and to is a parameter with units of mass 
determined by the spring constants and bead masses. In §16.4 we show how the classical wave- 
packets for this field have energy-momentum relation E 2 = P 2 c 2 + M 2 c 4 which mimics that of 
relativistic quanta: This is quite deliberate — it allows this non-relativistic system to illustrate many 
features of relativistic fields — and we show, for instance, that the momentum of a wave pac ket is 
given by p = yMv, where 7 = (1 — u 2 /c 2 ) -1 / 2 and v is the group velocity. We also show in (16.5 
that the solutions of this system obey a covariance: if <j>(x.,t) is a solution then <//(x,t) = 0 (x', t!) is 
also a solution, where (x, t) and (x 7 , t') are related by a transformation which is formally identical 
to a Lorentz transformation. 

To round off our discussion of classical non-relativistic fields we consider interacting field theories 
in §16.6| We discuss how interactions — which couple the otherwise independent planar traveling 
wave solutions — can be introduced either through non-linear springs (a ‘self-interaction’) or by 
coupling between different fields. We show that interactions between waves are particularly efficient 
if the waves obey a ‘resonance’ condition; this condition is that the sum of the spatial and temporal 
frequencies of the incoming and outgoing waves should be equal. This condition arises again later, 
where in the quantum mechanical analysis the interaction rates include a energy and momentum 
conserving 5-function which enforces this resonance condition. 

In chapter [TT] we develop the quantum mechanical description of non-relativistic fields. We start 
with the quantization of a single simple harmonic oscillator in §17.1|where we introduce the creat ion 


and destruction operators a* and a which are central to all quantum field calculations. In (17.2 


we 


describe the ‘interaction picture’, which is a hybrid of the Schroedinger and Heisenberg pictures, 
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and show how the time-dependent perturbation theory in this formalism leads to the ‘S'-matrix 
expansion’ in §17.2.1| This is illustrated with the example of a forced oscillator in §17.2.2| 

In §17.3 we apply these concepts to free — i.e. non-interacting — fields, using the example of the 
scalar elasticity model. We first obtain the creation and destruction operators for phonons of the 
discrete lattice system in §17.3.1| and show that these have the appropriate commutation relations 
and are related to the Hamiltonian in the proper way. We generalize to continuous fields in one or 
more dimensions is §17.3.2| 

In 117.4 we introduce perturbative interactions and compute scattering rates for various processes. 
These include scattering off an impurity in the lattice (117.4.1); scattering of phonons via a Xtp 4 
self interaction (§17.4.2 ); and in 117.4.3 we consider ‘second-order’ scattering of a phonon by the 
exchange of a virtual phonon of a different type. All of these are worked through in detail. In §17.4.4| 
we briefly discuss the contour integral formalism for computing scattering rates and we discuss the 
‘Feynman rules’ for this scalar phonon system in §17.4.5 In 117.4.6 we discuss the ‘kinematic 
constraints’ placed on scattering and decay processes by the requirements of conservation of energy 
and momentum. 

In chapter [18] we turn to relativistic quantum fields. In §18.1| we develop the theory of the mas¬ 
sive scalar field (or Klein-Gordon field). This proves to be very simple since the system is formally 
identical to the ‘scalar-elasticity’ model we had considered in the previous two chapters. Making this 
theory relativistically covariant is little more than choosing appropriate coefficients in the Hamilto¬ 
nian. We then discuss self-interactions, spontaneous symmetry breaking and scattering of particles 
via a coupling between independent fields. We discuss the generalization of the scalar field theory 
to more complicated fields and discuss quantum electrodynamics in §18. 2[ though the treatment is 
rather shallow. In §18.3| we show how the scattering amplitudes computed by perturbation theory 
lead immediately to kinetic theory in the form of the fully relativistic, fully quantum-mechanical 
Boltzmann equation. 

In §18.4| we return to the scalar field and explore the role of such fields in cosmology. We show how 
such fields can drive inflation, either in the very early or very late Universe; how they can behave like 
‘cold dark matter’ or like relativistic particles and we also show how, with an appropriate potential 
function, they can lead to domain-walls, cosmic strings and other ‘topological defects’. In §18.5| we 
explore in a little more detail the evolution of the Klein-Gordon field in the non-relativistic limit 
(i.e. where the wavelength is much greater then the Compton wavelength). We show that if we 
factor out a rapid common oscillation factor, the equation of motion for such fields is in fact the 
time dependent Schroedinger equation. We discuss the correspondence principle, and also show how 
non-relativistic fields are coupled to gravity. 
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Chapter 16 

Classical Field Theory 


Here we develop the classical Lagrangian approach to field theory. We first introduce a model 
consisting of a discrete lattice of coupled oscillators. While fairly simple, the model proves to be 
extremely versatile and the properties of the sound waves on this lattice demonstrate many of the 
properties usually associated with relativistic fields. Taking the ‘continuum limit, we show how the 
equations of motion for a field can be generated from the Lagrangian density. Next, we derive the 
conservation laws for momentum and energy; the latter is very similar to that is regular Lagrangian 
mechanics, but the former is very different and is a uniquely field theoretical construct. We explore 
the relation between wave energy and momentum for sound waves in our model system — which is 
formally identical to that for relativistic particles — and also show how the sound-wave equations 
display an invariance under Lorentz-like boost transformations. We then consider the effect of adding 
interaction terms to the Lagrangian. These introduce a coupling between the otherwise independent 
normal modes, and we emphasize the resonance conditions that must me satisfied to allow effective 
coupling. We then discuss some puzzles and paradoxes concerning the wave momentum. We next 
show that in the long-wavelength limit (corresponding to the non-relativistic limit) that, in addition 
to energy and momentum a fifth quantity is conserved. This extra conservation law is the wave- 
mechanical analog of conservation of particle number or proper mass. It is only approximately 
conserved, but this approximation becomes exact in the limit of small group velocity. Finally, 
we develop the equations governing the transport of energy and momentum in a self-interacting 
field theory. We show that the sound-wave energy density behaves just like a collisional gas, with 
same equations of motion, and with exactly the same adiabatic indices in the relativistic and non- 
relativistic limits. 


16.1 The BRS Model 


Consider a system consisting of a 1-dimensional array of identical beads of mass M constrained 
to move in the vertical direction by rods and with a spring of spring constant I\ connected to the 
particle. The rods are assumed to be infinitely stiff, and the bead slide up and down with no friction. 
We will refer to this beads, rods and springs system as the ‘BRS’ model. 

Let (f>j denote the displacement of the jth particle from the rest position. As it stands we simply 
have a set of independent SHM oscillators with equations of motion <pj = —( K/M)<f>j . Now add some 
additional springs with spring constant K' which link the beads to their neighbors as illustrated in 
figure |16.1 


The Lagrangian is just the kinetic minus the potential energy, which we can readily 


write down: 


£(</>*, 4 >%) = ^ $ ~ T " X ^- 1 “ ^) 2 _ ~r X (^ +1 “ ^) 2 


(16.1) 


The reason for the factor 1/4 in the energy for the connecting springs is that the energy in these 
springs is shared between two neighbors. 
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Figure 16.1: A lattice of coupled oscillators. Massive beads are constrained to move in the vertical 
direction by rods, and tethered to the base by springs with spring constant I\. Neighboring particles 
are also coupled by springs of spring constant K'. The displacement of the jtli bead is (f>j. 


The equations of motion for this system are the Euler-Lagrange equations 


d ( dL \ _ dL 
dt\d(j)J d(f>' 


The partial derivatives with respect to the velocities appearing on the LHS are 



and the partial derivatives with respect to the displacements on the RHS are 

dL 

— = - <j)j) + Oj+i - - K(j)j 

so the Euler-Lagrange equations are 

M(j>j — i — 2 (pj + (j)j + 1] + K<j)j = 0. 


(16.2) 


(16.3) 


(16.4) 


(16.5) 


In terms of the <pj we have a set of coupled oscillators. It is easy to find a set of normal modes 
which are decoupled. Let us imagine we construct a circular loop of N of these systems (which means 
we don’t need to to worry about boundary conditions) and define the discrete Fourier transform of 
the displacements 

^ k (t) = Y,Mt)^ ijk/N (16.6) 

3 

The transform of the Euler-Lagrange equations is then 


y [ M(f>j - K'[(/)j_i - 2 4>j + (f> j+ 1] + Kcj)j 
0 


g 2 TTikj/N _ g 


Now ^2(j)j-ie 2mk ^ N = <I>fee 27r * fc / Ar and '^2 <f)je 2 ' K ' lk ^ N = 'Ffc so (16.71 says 


(16.7) 


<F fc + [2K'{1 - cos 2irk/N) + K}<L k = 0 


(16.8) 
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Figure 16.2: Dispersion relation for the discrete lattice model (16.91. The minimum frequency is 
w m in = \JK/M at k = 0. Here all of the beads oscillate up and down together. The maximum 
frequency is w m ax = \J (4A' 7 + K)/M at k = ±7V/2. Here neighboring beads are 180 degrees out 
of phase with each other. The frequency is also defined for wave-numbers |fc| > N/2, but these are 
just aliased versions of the waves within the range |fc| < N/2. For the case K -C K' , i.e. where the 
connecting springs are relatively strong, and as assumed here, there is another scale in the problem: 
fc* = N yjK/2K'. The significance of this spatial frequency scale is that for k <C £;* most of the 
potential energy is in the A'-springs, while for k fc* most of the potential energy is in the K' 
springs which link the particles. For k <C N (i.e. wavelengths much bigger than the separation of 
the oscillators) the dispersion relation is io{k) ~ w m in\A + (&A*) 2 - Thus, for /c* -C k <C N we have 
uj (X k and the waves are non-dispersive. 


so evidently the discrete transforms <!>£, defined in (16.61 are a set of normal modes; i.e. the equation 


of motion for the mode k is independent of that for all of the other modes k! ^ k. 


The solutions of (16.81 are &k{t) = <Lkoe- ±lut with frequency given by the dispersion relation 


>{k) = 


2K'(l - cos 2-kU/N) + K 
M 


(16.9) 


as sketched in figure [T672| 

For completeness, we note that the momentum conjugate to the displacement 4>i is 

dL „ r 1 
Pi = TPr = 

U(f)i 


(16.10) 


and that the Hamiltonian is 


H(pi,(/>i) = Y^Pi^ ~ L = Y p i/ M ~ L - 


(16.11) 


This is identical in form to the Lagrangian (16.11 save that the terms involving the spring constant 
K , K' have positive sign. We could have inferred this from the fact that L = T — U while the 
Hamiltonian is the total energy H = T + U. 


16.2 The Continuum Limit 

It is interesting to consider the limiting situation where the wavelength is much greater than the 
spacing between the units (or equivalently, for finite wavelength, in the limit that the spacing Ax —> 
0). In this limit, the displacement varies little from unit to unit, so viewed macroscopically 4>j behaves 
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like a continuously varying field 4>(x) with x = jAx. We can therefore replace finite differences like 
(f>j+i — <pj in the Lagrangian with AxV</>. 

The Lagrangian then becomes 


L = ^2 Ax 

j 



K' Ax 
2 


(V0) 2 



(16.12) 


and in the limit Air —> 0 the sum becomes and integral: ^ Ax... —> J dx ..., so the Lagrangian is 


L = 


dx V<?i>, 4>) 


(16.13) 


where we have defined the Lagrangian density 

£(4>, 0) = \p{fi - c 2 (V</i) 2 - p 2 cj> 2 ). (16.14) 

This is a quadratic function of the field and its space and time derivatives with constant coefficients 


P = 


M 

Ax 



(16.15) 


Here p is the line density; p is the frequency for the oscillators if decoupled, and c s , as we shall see, 
is the asymptotic wave velocity for high spatial frequency waves. Similarly, the total Hamiltonian is 
a spatial integral 


H = 


d 3 x H(4>, V0, (j)) 


(16.16) 


where the Hamiltonian density is 


H{j>, V& 0) = l -p{tf + c 2 (V0) 2 + p 2 cf 2 ) 


(16.17) 


where the three terms represent the kinetic energy density and the densities of energy in the springs 
K' and K respectively. 

The action now becomes a 2-dimensional integral over time and space: 


S = 



dt 


dx V</>, <f>). 


(16.18) 


For a system with a finite number of degrees of freedom n (the displacements of the n beads in our 
lattice system) we obtain the Euler-Lagrange equations by varying the n paths qi(t) —> qi(t) +Sqi(t) 
and requiring that the action be stationary. Here the index i has become the continuous variable x 
and we require that S above be stationary with respect to variations if the field <fi(x,t): 

<j)(x, t) —> t ) = <f){x , t) + S<p(x 1 1). (16.19) 

The variation of the field velocity and gradient are 

4> = 4> + d /dt = 4> + S (f> (16.20) 


and 


S7(j)' = V0+ V(<5<£), 


(16.21) 


so the variation of the action SS = S 1 — S corresponding to the field variation (16.19) is 


SS = dt dx £(4> + 6<j), V(/> + V(<5</>), (f> + 6(f) — dt dx C(4>, V<f>, 4>). 


(16.22) 


For an infinitesimal variation, we can make a 1st order Taylor expansion to give 
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Reversing the order of the integrals, the first term is the integral over position of 


/ 


dt 8<f >—-. 
d</> 


(16.24) 


In this integral, x is to be considered constant, so the symbol S(f> denotes the ordinary time derivative 
of 5<j>(x, t) at this position. This means we can integrate by parts to give 


/ 


dt 5(t>^ 
dcj) 


dC 

d<t>\t 



dt S4, d{ac/9 ft 
dt 


11 


(16.25) 


where the operator d/dt denotes the derivative with respect to time at fixed position x. Now the 
boundary terms vanishes if Scf>(x , t) is assumed to vanish at the initial and final times. Alternatively, 
we can take the time integral from t = —oo to t = +oo and require that the fields tend to zero as 
t —> ±oo. This allows us to replace the integral involving 8<f> by one involving 5(f>. 

A word on the notation is perhaps in order. The time derivative operator d/dt here is really the 
partial derivative with respect to time at constant position. Normally we would write this as d/dt 
but here that would be ambiguous since it does not tell us what variables are to be held constant. 
In general, the Lagrangian density may depend explicitly on time (e.g. if any of the coefficients like 
p and p were time dependent) and we have C{<j>, V<^, </>, t). When we write d(dC/d<p)/dt we mean 
the time derivative at constant value of the field and its derivatives. In our case this vanishes since 
dC/dt = 0. However, if we evaluate dC/dcj) using the actual field <j>(x,t) and its derivatives, this is 
some specific function of x and t. This has a well defined time derivative at constant x, which is 
what we mean when we write d(d£/d<j))/dt. 

Similarly, the spatial integral appearing in the second term is 


dx V(<5<(>) 


dC 


6<j) 


dC 

dV/) 



djdC/diNcj))) 

dx 


(16.26) 


where the operator d/dx denotes the derivative with respect to position at fixed time. Here again 
we can discard the boundary terms if we assume that the fields tend to zero at spatial infinity, or if 
we invoke periodic boundary conditions. 

With these substitutions, every term in (16.231 now contains a multiplicative factor 5<f>, and the 
variation of the action is 


<55= - 


dt 


dx 8</> 


d{dC/d4>) 

dt 


d{dC/d{N(j))) _ dC 
dx d(j) 


(16.27) 


The actual field 4>{x,t) is such that the action is extremized; i.e. the variation in the action must 
vanish for an arbitrary perturbation <5</>(x, t ) about this field. This means that the quantity in 
brackets must vanish at all points (x,t). This gives the Euler-Lagrange equation 


d(d£/d(/>) d(dC/d(y</>)) 

dt dx dcj) ’ 


(16.28) 


which provides the equation of motion for the field. 

This was derived here for a Lagrangian density with no explicit time or position dependence, but 
had we taken instead £(<j>, V0, <f>, x, t) we would have obtained precisely the same variation of the 


action (16.221 and therefore we would also have obtained precisely the same field equations as in 


(16.28) above. 


This is quite general, but also somewhat abstract. If we specialize to the BRS model Lagrangian 


density (16.141, for example, the Euler-Lagrange equation (16.28) becomes 


> - c 2 s X/ 2 cj> + p 2 <t> = 0. 


(16.29) 
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Figure 16.3: The panel on the left shows the continuum limit dispersion relation panel for the BRS 
model. The center panel shows the phase velocity and the right panel shows the group velocity. 


This is a 1-dimensional wave equation, with traveling wave solutions 

(j>{x,t) = cj)oe i( - ut ~ kx \ 


and dispersion relation 


;(fc) 2 = c 2 fc 2 + p 2 


(16.30) 


(16.31) 


This dispersion relation is entirely equivalent to the low-frequency limit of (16.9) with i = x/Ax = 


Nx/L and with the integer mode index k replaced by k/Ak with A k = ‘In/L. We could also have 
obtained the continuum limit wave equation directly from the discrete version in a similar way. Our 
motivation for taking the more laborious route above is to illustrate how the wave-equation etc. can 


be derived directly from a Lagrangian density like (16.141 for a continuous field <f>, regardless of 


(16.33) 


whether this is really the limit of some discrete lattice system. 

Just as before, the dispersion relation ( 16.31| d efines a characteristic spatial frequency fc* = 
yjp/c a . Referring to the Hamiltonian density (16.171, for k fc* most of the potential energy is in 
the field gradient term pc 2 (V(/)) 2 /2 while for /: < t the potential is mostly pp 2 (j) 2 / 2. The phase 
velocity is 

Whase = -JT = CsVl + lhJW (16.32) 

where u>k = w(fc). This tends to infinity for low wave numbers k <C fc* and to c s for k fc*. The 
group velocity is 

dLUfc 2 k c s 

Vgroup = ~dk = ° s u^ = yjx + {kjky-' 

The group velocity gives the speed of motion of a wave-packet , and therefore also the speed at which 
information can be propagated. Here the group velocity tends to c s as k —> oo. For small wave 
numbers k <C &*, the group velocity is proportional to k. The dispersion relation and the phase and 
group velocities are illustrated in figure 16. 3| 

The generalization to two or more spatial dimensions is quite straightforward. We could, for 
example, cover a table with a grid of vertical rods with beads bouncing up and down on springs, and 
couple each bead to its four nearest neighbors in much the same way as we did in one dimension. 
The 1-dimensional discrete index i could then be replaced by a 2-vector i with integer components 
i 1 , i 2 . Taking the continuum limit then involves replacing i with the continuous variable x = i/ Ax. 
The Lagrangian density would now be 


£(<(>, dcji/dx 1 , dcj)/dx 2 , </>) = £(<(>, <t>,j,4>) 


(16.34) 


where = dfi/dx^, and the action becomes a three dimensional integral S = f dt f d 2 x C. Fol¬ 
lowing the same reasoning as above we are led to an Euler-Lagrange equation like (|16.28 1, but 
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with multiple spatial gradient terms. Using the Einstein summation convention, this can be written 
succinctly as 


d(d£/d0 ) d(dC/d(f>j) d£ 

dt dxl dcj> 


(16.35) 


The continuum limit Lagrangian density for the multi-dimensional lattice system is identical 

V operator is the multi-dimensional spatial gradient operator, so 
Taking the derivative with respect to the field gradients gives dC/dipj = c 2 pcj)^, 


to (16.141, but where now the 


so d(dC/d(/) t j)/dx1 = c 2 p<j>jj = c 2 pS/ 2 0. The continuum limit equation of motion for the multi¬ 
dimensional system is then identical to (16.291, but where now V 2 denotes the Laplacian operator. 


16.3 Conservation of Wave-Momentum 


In ordinary classical mechanics, for a closed system, i.e. one for which the Lagrangian has no explicit 
time dependence, the total energy E = "Y/pidi — L is conserved. In classical Lagrangian field theory, 
the time is replaced by space-time coordinates t —> x 11 = (t,x). The action is S = f dt L —> 
f dt f d 3 x £ for example, and the 2nd time derivative operator in the equation of motion becomes 
the wave operator d 2 /dt 2 —> d 2 /dt 2 — c 2 V 2 . If the Lagrangian density does not depend explicitly 
on the spatial coordinates x l — as in the continuous field theories considered here — then there are 
additional conserved quantities, whose properties we will now elucidate. 

To start with, let the Lagrangian density be £{cj>, 4>,u <j>, x l , t), where, as usual <p t i = dcj>/dx l 
(remember d/dx l here means space derivative at fixed time t). We have allowed here for an explicit 
dependence on time or position. In the underlying discrete lattice model, the former could describe 
a time variation of the spring constants, for example, and the latter could describe a system where 
the masses and/or springs are not all identical. 

Now let’s calculate d£/dx\ the total derivative of £ with respect to spatial coordinate x 1 at 
constant time. This is 


d£^_d£dcj)_ d£_dx^_ d£dcj)_ d£ 

dx 1 dcj) dx l d<j) j dx l dcj) dx 1 dx i 


(16.36) 


where summation over the repeated index j is implied. Now the penultimate term here is {d£/dcj>)cj)^, 
but we can eliminate d£/d(j) using the Euler-Lagrange equation (16.281 to obtain 


dC d£d(j).i d(dC/d<j>) d£ d<p ti d(d£/d<pj) , d£ 
dx 1 dcj) dt dt ’* dcj )j dxl dxl ’* dx i 


where we have used dcf>/dx 1 = dc/^/dt and dcj>j/dx z = = d(j)^/dx J . Written this way, we see that 

the first pair of terms on the RHS are the time derivative of {d£/d</>)0 at fixed position, and the 
second pair of terms are the derivative with respect to a, J of (d£/d0 i j)cf> t i at fixed time. The LHS 
of this equation can also be written as d£/dx l = 5ijd£/dx°, so, rearranging terms, we have 


d_ 

dt 





ac 

dx i ' 


(16.38) 


There are three equations here, one for each of i = 1,2,3. If we now stipulate that d£/dx l = 0; 
i.e. there be no explicit dependence of the Lagrangian density on position x l (which, in the underlying 
lattice model, says that all of the beads and springs are in fact identical), then each of these says 
that the partial time derivative of some quantity Pi(x, t) is minus the divergence of a vector w/x, t ): 


Pi = -V ■ w i 

where, for example, for s = lwe have 




' 1 ' 


d£/dcj) tX 

Px = -(d£/d0)cj) >x 

and Wj = £ 

0 


d£/d<j> iV 



0 


d£/dcj> )Z 


(16.39) 


(16.40) 
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Each of equations (16.391 is just like the equation expressing conservation of electric charge: p = 
—V • j. Integrating over all space, the right hand sides of (16.391 disappear and we find 


dP 

dt 


= 0 


where 


P(t) = 


d 3 a 


p(x,f) = - 


■ 3 dc _ 
dr x —rV« 
d</> 


(16.41) 


(16.42) 


Equation ( |16.41 1 expresses conservation of the three components of the vec tor P. 

Clearly, it was critical here to assume that the final term dC/d x 1 in (16.381 vanish since, in 
general, f d 3 x dC/dx 1 ^ 0 (this can be seen most easily from (16.361 which shows that dC/dx 1 ^ 
dC/dx 1 , whose spatial integral does vanish, rather dC/dx 1 contains three other terms which are not, 
in general expressible as a spatial gradient). The conservation of P is a direct consequence of the 
invariance of the Lagrangian under shifts of position. 

This is quite general. We have made no assumptions about the form of the Lagrangian density, 
save that it is a function only of <j), V0, </> and possibly t. For a great many systems, however, 
the velocity appears in the Lagrangian density only in a kinetic energy term like p<\> 2 / 2. This 
would encompass systems like the one we have constructed, but with arbitrary potential energy. 
The conserved vector is then 

P = — p J d 3 x (16.43) 

What is this vector? The Lagrangian density has the dimensions of energy density, while <p is 


a displacement with units of length, so it is evident from (16.421 that P has units of momentum. 


However, this quantity is quite distinct from the sum of the ‘microscopic’ conjugate momenta which, 
for our beads and springs model is YlPi = and which, in the continuum limit becomes 

pf d 3 x <j>. This is very different from the quantity appearing in (16.43) which is of second order 
in the field, while the summed microscopic momentum is first order. If this were not enough to 
convince us that these are fundamentally different entities, we might also note that the microscopic 
momentum vanishes for a wave, since opposite half-cycles cancel, and that, for our 1-dimensional 
beads and rods system, the bead momentum is perpendicular to the wave propagation direction 


while (16.42) is parallel. 


The vector P is not included in the microscopic momentum. Both P and the microscopic mo¬ 
mentum are conserved independently. They arise from quite different symmetries; in the discrete 
lattice model, conservation of the microscopic momentum follows from invariance of the Lagrangian 
if we shift the entire system in space; this symmetry is the homogeneity of space itself. The micro¬ 
scopic momentum is conserved even if the beads and springs are heterogeneous. The conservation of 
the wave-momentum P follows from invariance of the Lagrangian under shifts in position along the 
lattice. This is a much more restrictive condition. We will call P the field-momentum or the wave- 
momentum. The vector p is the density of wave-momentum, and the tensor Wij is the momentum 
flux density. 

Now the other conserved quantity is the total energy E. Following the same line of argument as 
above, expanding the time derivative of the Lagrangian density dC/dt we are led to 


dt \d(j) 


d ( dC A _ dC 

dxl \dcj). : j J dt 


(16.44) 


Now, if the last term dC/dt = 0, this is 


e = —V • F (16.45) 

with energy density e being the Hamiltonian density as obtained before, and with energy current 
density Fj = q\dL/d(f>j. For the BRS model, and indeed for any model where the field gradient V</> 
enters the Lagrangian density only in a term —pc 2 (V0) 2 /2, the energy current is 


F = -c 2 s p<j)V(f) = c 2 s p 


(16.46) 
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so the energy flux is proportional to the momentum density. 

Consider a wave-packet with mean wave-number k and overall size 1 /k, so the packet is 

essentially monochromatic. If we perform the spatial integrals for the energy and wave momentum, 
then these are clearly localized in the region of the packet. As this packet moves, the energy and 
momentum are transported with it. Now the energy density is e = p{(j) 2 + (V</>) 2 + p 2 <p 2 )/2 = 
+ k 2 + m 2 )(f) 2 /2 = pu;£</> 2 . Integrating this over space gives the total energy E = pui £ f d 3 x (j> 2 . 
Similarly, the momentum is P = pcu k k f d 3 x p 2 . The ratio of momentum to energy for a wave packet 
is therefore 


P k 

E w k 


(16.47) 


which is compatible with the quantum mechanical relations P = ftk and E = huj^. 
The energy and momentum for a large classical wave-packet can be written as 


E = w k 



and 



(16.48) 


The quantity in brackets is constant for a nearly-monochromatic wave packet. It can be written as 
Nh, where N can be thought of as the number of fundamental quanta comprising the wave packet. 


16.4 Energy and Momentum in the BRS Model 


Specializing to the BRS dispersion relation = c 2 k 2 + p 2 (16.311, the energy-momentum relation 
for a wave-packet is 

E 2 = P 2 c 2 s + m 2 c 4 s (16.49) 

where m, which has dimensions of mass, is defined as 

m = (16.50) 


Equation (16.49) is identical in form to the energy-momentum relation for a relativistic particle 
of mass m, but with the speed of light replaced by c s ; the asymptotic wave-velocity for a highly 
energetic wave packets. 

The group veloc ity is related to the wave-vector by v g = c 2 k/wk, or equivalently by v g = 
c 2 k/-\/c 2 /c 2 + p 2 by (16.33). Solving for k gives 


or from (16.48), (16.501 
with 


k = 


P = 7 mv g 


1 - v V' 


2 / c 2 C 
g/ c s 


,2 v 9 


and 


E = ymc 2 


(16.51) 


(16.52) 


7 = (1 ~ v 2 /c 2 )~ L/2 . (16.53) 

These results are exactly analogous to the momentum-velocity relation for a relativistic particle. 


16.5 Covariance of the BRS Model 


Waves in the 1-dimensional BRS model are not relativistically covariant. There is a specific inertial 
frame — that in which the system as a whole is stationary — in which the waves obey the dispersion 
relation (16.311. If we have a wave of a certain spatial frequency traveling along, then if we run 
alongside it it will have the same spatial frequency, but now u> = 0 cck- However, consider the 
combined transformation on the space and time coordinates 


c s t! 

— A 

C s t 


7 

~Pl 

C s t 


l(C s t - Px) 

x 1 

— i \ 

X 


-/3y 

1 

X 


_ 7 (x - Pc s t) 


(16.54) 
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This is very similar to a Lorentz transformation. If we consider the origin in the un-primed coordinate 
system x = 0, this moves along the path x' = —/3 r yc s t, t' = yt, i.e. with speed v' = dx'/df = — (3c s . 
This transformation therefore corresponds to a boost with the origin in x' ,t' coordinates moving at 
velocity v = (3c s in the un-primed coordinate space. 

It is easy to show that the Lagrangian density, equation of motion for the field etc., are all 
invariant under this transformation. This is very useful. It tells us that if we have one solution of 
the field equation <p(x) = (p(x, c s t) then the function 

<p'(x) = (/>(Ax) (16.55) 


is also a solution. Here A can be the transformation matrix for an arbitrary boost. 

As an illustration, consider a plane wave <j>{x, t) = <j>Qe l ^^ x,t \ with phase ip(x,t) = cut — kx with 
u! = u)k- Applying the transformation for a boost gives ip = ui't' — k!x' with 


to' /c s 


1 

_i 



7 

-&T 

uj/c s 

/?7 

7 

k 


(16.56) 


One can readily show that a/ = c 2 s k' + /i 2 , so a/ and k' still satisfy the dispersion relation (16.311: 
u/ = u>k'- Thus a wave-like solution in transformed coordinates is also a solution of the field equations, 
but with frequency and wave-number (i.e. energy and momentum) transformed appropriately. An 
arbitrary solution can be written as a sum of plane-waves, so the result (16.551 is quite general. 

The generalization to 2 or 3 dimensions is quite straightforward. Just as for the relativistic 
Lorentz transformation, distances perpendicular to the boost are unaffected. 


16.6 Interactions in Classical Field Theory 


The Lagrangian density (16.141 is somewhat idealized in that the spring forces are assumed to 


be perfectly linear in the field displacements (so the spring energies are perfectly quadratic). Real 
springs are quadratic only in the limit of vanishingly small displacements, and a more realistic model 
for the spring potential energy would be to have a potential energy density 


V(</>) = pp 2 cp 2 /2 + A</> 4 


(16.57) 


This kind of non-quadratic potential can also be realized with an ideal spring as illustrated in 
figure |16.4| With this modification, the pure non-interacting plane-wave solutions are no longer 
exact solutions of the wave equation. The non-quadratic contribution to the potential energy will 
introduce a coupling , or interaction, between the waves, and the contribution to the Hamiltonian 
density is called the interaction Hamiltonian. 

This specific type of interaction is called a self interaction since the </> field interacts with itself. 
Another type of interaction is to couple two or more different fields. One way to introduce an 
interaction between two fields <p and \ of the type we have been discussing is shown in figure |16.5| 
If we assume the field displacements are small and perform an expansion we find that the stretching 
of the K x spring is 


A l 



1 <p 2 

2 a 


(16.58) 


plus terms which are fourth order or higher in the fields. The potential energy is then 


mx) 


\k*<P 


"Ay 
2 x 


2 . 0 2 X 2 


X 


2 a 


1 </> 4 ' 
4 a 2 


(16.59) 


plus terms of fifth order of higher in the fields. We see that this modification to our model has intro¬ 
duced two types of interaction between the fields — one of the form a<p 2 x and another proportional 
to <t> 2 X 2 — and a cp 4 type self-interaction of the (j> field. 
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Figure 16.4: A self-interacting field can be realized with a slight modification to the original model. 
Here the bead slides along the horizontal rod, but is tethered by an orthogonally connected spring. 
The length of the spring is l = \J a 2 + <f> 2 . If the relaxed spring has length Iq, the energy in the spring 
is V(<j>) = K(l — Iq) 2 /2. This gives a potential which, for small displacement 0 can be expanded as 
V(<j>) = constant + ^x 2 0 2 /2 + A0 4 + .... The ‘mass term’ coefficient is n = K(1 — l 0 /a)/2. If a > l 0 
(i.e. the spring is in tension when 0 = 0) this is positive. The coupling coefficient is A = Kl 0 /4a 3 
which is always positive. For a < Iq (i.e. the spring is in compression for 0 = 0) the mass term /x is 
negative. In this case the potential has a ‘w’ shape, 0 = 0 is a point of unstable equilibrium, and 
there are two asymmetric minima 0 = ±0o- This is an ‘end on’ view of a chain of these units — 
think of multiple replicas stacked perpendicular to the page — and the connecting arms and springs 
are not shown. 


Consider the Hint = cc0 2 x interaction term. This will introduce a term in the Lagrangian density 
dint = —Hint and will therefore introduce a term -dC-mt/dx = —ck 0 2 in the equation of motion for 
the x held, which becomes 

X - c 2 V 2 x + n 2 x x + a4> 2 = 0. (16.60) 

From the point of view of the x field, the interaction looks like a force proportional to 0 , and 
consequently oscillations of the 0 field can excite oscillations of the x field. Now the most efficient 
way to excite an oscillation mode of the x field is to drive it with a force which looks like the velocity 
X for that mode (which is also a traveling wave). Consider the situation where (in the absence of 
the interaction) the 0 field consists of two plane waves: 

(j>(x) = cos(fci • x) + cos(fc 2 • x) 

where k = (u>/c s , k) and where u> and k satisfy the <f >-field dispersion relation to 2 = 

Squaring this, we see that the force in the y-field equation of motion is proportional to 

(j> 2 = cos 2 (fci • x) + 2cos(/ci • x) cos (k 2 • x) + cos 2 (fc 2 • x) 

= 1+| cos(2/ci • x) + cos((fci + fc 2 ) • x) + cos((fci — fc 2 ) • x) + \ cos(2fc 2 • x) 

The force therefore contains four contributions which look like plane waves with various temporal 
and spatial frequencies. The second term here, for example is cos((fci + fc 2 ) • x) = cos((wi + w 2 )t — 
(ki +k 2 ) -x). This will be efficient at exciting oscillations of the x field if H = w 1 +u; 2 and q = ki +k 2 
satisfy the y-field dispersion relation O? = c 2 q 2 + ^t 2 . For example, let’s assume that /i x > fi#. We 
could then collide two plane 0-field waves with u>\ = w 2 = /x x /2 and ki = — k 2 . The result is a 


(16.61) 
c 2 fc 2 + m 2 . 

(16.62) 
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Figure 16.5: We can modify our beads and springs model to introduce an interaction term in the 
total Hamiltonian. To do this, we need two separate sets of coupled oscillators to represent the 0 
and y fields. We will assume that the beads all have the same mass and that the coupling springs 
are identical, so the asymptotic high-frequency group velocity is the same for both types of waves. 
The base springs — which determined the oscillation frequency for low-fc waves — will generally be 
different for the two fields. We now lay the 0 oscillators on their side, and we connect the y-bead 
base spring to the 0-bead as shown. The displacements of the beads from their rest positions (where 
the springs are relaxed) are indicated. This is an ‘end on’ view of the chain of these units — think 
of multiple replicas stacked perpendicular to the page — and the connecting arms and springs are 
not shown. 


component in the force with temporal frequency fl = fi x and q = kq + k 2 = 0, which is just what is 
needed to excite the k = 0 mode of the y-field. 

More generally, we expect strong coupling between a pair of 0-modes with spatial frequencies ki 
and k 2 and a y-mode with wave-number q provided the resonance conditions 


q = ki + k 2 

fiq = Ul\ + W 2 


(16.63) 


are satisfied. 

If we consider instead the interaction 7-fmt = A0 2 y 2 /2 we find that the equations of motion 
become 


0 - c 2 V 2 </> + 0 + Ay 2 0 = 0 

X - c 2 V 2 y + MxX + M 2 x = 0. 


(16.64) 


Using the same line of argument above, one can readily show that the interaction can efficiently 
transfer energy from a pair of ^-modes ki, k 2 to a pair of y-modes qi, q 2 provided 


qi + q2 = ki + k 2 

+ fl 2 = OJi T CU2- 


(16.65) 


In the equation of motion for the amplitude for the mode qi, the combination of modes ki, k 2 and 
q 2 produces a resonant force. Things are a little different from the 00 —> y process above, since 
here the interaction is not effective if, for instance, the y-fielcl vanishes initially. Similarly, with 
Hint = o:0 2 y, the process y —> 00 is not effective classically unless there is some initial power in 
the 0 field at the appropriate frequencies. We can see this from the model in figure |16.5| If 0 = 0 
initially, then no amount of wiggling of the y-field can excite the 0 oscillations. 
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We will see that the quantum mechanical transition amplitudes for these processes contain a 
4-dimensional (5-function which enforces these resonance conditions, and at the same time enforces 
conservation of total energy and wave-momentum. 

We obtained the conservation of wave-momentum in { 16.3 for a single field. However, the deriva¬ 
tion can easily be extended to interacting fields, and it is easy to show that for any coupling of the 
form = V(<j),x) the sum of the wave-momenta for the various fields is conserved. 


16.7 Wave-Momentum Puzzles 

Conservation of wave-momentum in the ‘scalar elasticity’ model considered here raises some inter¬ 
esting puzzles. 

• Consider a ring of coupled oscillators (with beads sliding on vertical rods) mounted on a 
stationary turntable, and let there be a wave propagating around in a particular direction, say 
clockwise. This wave carries momentum around the ring, yet there is clearly no motion of any 
material around the ring. Now imagine there is a small amount of friction which causes the 
wave to damp does the turntable absorb the momentum and start to spin? If not, where 
did the wave momentum go? 

• For the same system, consider an external agent who applies forces to the particles in this 
system in order to excite a clockwise propagating wave. Does this agent experience a recoil? 

• Consider a linear chain of oscillators mounted on a stationary skate-board with a momentum 
carrying wave propagating along it. Let the end bead be fixed to its rod, so that the wave is 
reflected and its momentum is reversed. Does the skate-board start moving? 

What is amusing about these puzzles is that a student with only an elementary knowledge of 
dynamics would have no hesitation in answering (correctly) that the turntable does not spin up; 
that the forcing agent feels no recoil (there being no force applied in the direction of the wave- 
momentum) and that the skate-board does not suddenly accelerate. It is only with the benefit of 
the mathematically sophisticated Lagrangian treatment that we are prepared to contemplate such 
preposterous notions. 

The resolution of the apparent conflict between the naive and sophisticated treatment in most 
of the above examples is that we have broken the symmetry of the system which was required in 
order that wave-momentum be conserved. When we apply a force, or when we pin one of the beads, 
the system is no longer symmetrical and wave momentum is no longer conserved. In the second 
example, for instance, momentum is not conserved while the force acts — so wave-momentum is 
created with no recoil - but once the force stops acting the wave momentum is conserved. 

The first example is a little different. Here we need to think more carefully about what it means 
to add friction. Fundamentally, this can be thought of as exciting phonons in another lattice; that of 
the material comprising the turntable. We could model this, in principle, by adding some coupling 
term to our Lagrangian; this would allow the momentum to be transferred from the waves on our 
lattice to sound waves in the turntable. If the Lagrangian for the latter were spatially homogeneous 
then the wave-momentum would still be preserved, but would not reveal itself as net rotation; to 
see the momentum we would need to look carefully at the sound waves to see that they are in fact 
anisotropic. Realistically, the fact that the turntable is finite means that its Lagrangian density is 
not translationally invariant and therefore the wave-momentum would not be conserved. The same 
is true for impurities in the material which would scatter and isotropize the wave-energy. 

If wave-momentum conservation is so easily broken then what use is this concept? The answer is 
that we believe that in reality the Lagrangian is perfectly symmetrical under spatial translations. A 
real impurity in a lattice isn’t really an asymmetric term in the Lagrangian, as we have pretended, 
rather it is an asymmetric configuration of some other field, which is coupled to our lattice waves 
via an interaction term which is symmetric under translations. 
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16.8 Conservation of ‘Charge’ 

Consider two fields a(x, t), b(x,t ) obeying the free-held ‘scalar elasticity’ held equations 

a — c 2 V 2 a + p 2 a = 0 
b - c 2 V 2 b + p 2 b = 0. 


(16.66) 


These helds have the identical parameters c s , p but are completely independent of one another. Now 
multiply the hrst by b and the second by a and subtract. The terms involving p cancel and we have 


ba — ab = c 2 (5V 2 a — aV 2 6). 


(16.67) 


Now hrst term on the LHS can be written as ba = d(ba)/dt — ba and similarly for the second term. 
The hrst term in parentheses on the RHS is likewise 6V 2 a = V • (6Va) — V6 • Va and similarly 
for the second. With these substitutions, the terms involving ah on the LHS cancel, as do those 
involving Va • V5 on the RHS, and we obtain 


J^(6a — ab) = -c 2 V • (aVfc - bVa). 


(16.68) 


This is clearly a conservation law , of the familiar form dn/dt = — V • j with ‘density’ n = ba — ab and 
‘current’ j = c 2 (aV6 — b'Va). An immediate consequence of this is that the integral of the density 
Q = J d 3 x n(x, t) is a constant. 

This is very peculiar. Why should two independent helds, when combined in this way, have such 
a conservation law? What is the physical meaning of the conserved ‘charge’ Q here. 

Before trying to answer these questions, let’s explore this in a slightly more general and also 
more formal manner. First, let’s consider a and b to be the real and imaginary parts of a complex 
scalar field </>: 

d> = a + ib and 6* = a — ib 


from which a and b can be recovered as 

a = (<p+ <f>*)/2 
The Lagrangian density is 


and 


b= ((j>-<j>*)/2i. 


C = -( a 2 — c 2 (Va) 2 — p 2 a 2 + b 2 — c 2 (V6) 2 — p 2 b 2 ) 


which, in terms of <j>, <j>* is 


£= |(#*-c 2 V<A-V0*-A 


(16.69) 

(16.70) 

(16.71) 

(16.72) 


In (16.721, cj), 4>*, V<^>, (j) and (jA are all considered to be independent helds. (This is 

just like how the real scalar held Lagrangian £(<j>, V0, <j>) is considered to be a function of three 
independent helds cj>, <fi and V0). The equations of motion are obtained, as usual, by requiring that 
the action S = J dt f d 3 x £ be stationary with respect to variations <t> — > + d<f> and </> * — > + S(f>*. 

The former variation gives 


d£ 


dt \d(j) J dx l \dcpi 


d£ 


-^=0 

dcj) 


which, in our case, yields 


- c 2 V 2 </>* + p 2 ^ = 0 


2 j* 


(16.73) 


(16.74) 


and the latter gives an identical equation for <fi. These equations of motion, with (16.691 are equiv¬ 
alent to equations (16.661. 


Now the Lagrangian density (16.721 is rather symmetrical. In particular, it is invariant under 
the transformation 


= e 


and 


= e 


(16.75) 


-iOi-k 
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where 9 is an arbitrary constant. This provides a nice way to obtain the conservation law (16.681. 
We can think of the transformed field <j>' as a function of x, t and 9 , but the invariance of C implies 
that total derivative of C with respect to the parameter 9 vanishes: 


dC _ dCdfi 
~l9~ 0^89 


8C d<f) t i 
d(p,i 89 


8C dcj) 
~8U\>~89 


8C dcjA 

d^*~89~ 


8C 

89 


8C 8(j> 
8ct>* 89 


AA- ( 16 - 76 ) 


Using the equations of motion ( |16.73 ) and its complex conjugate to eliminate 8C/8(j> and 8£/8(j)* 
and using dcj)/89 = d(d<j)/89) / 8t = id(f>/8t etc. this becomes 


8n 

dt 


with density 


and current 


n = i 


ac 

A 


j = i 


. ( 8C 


-Vj 

(16.77) 

dc A 

8^ ) 

(16.78) 

dc A 

8V<t >* ^ J ' 

(16.79) 


\8Vp 

This is an example of Noether’s theorem , which says that if the Lagrangian density is invariant 


under some continuous transformation such as (16.751 there is a corresponding conservation law. 


This specific type of transformation is known as a global gauge transformation. 


For the free field Lagrangian (16.721 the density and current are 

n = i{4>4>* ~ </>*</>) 


(16.80) 


and 


j = - fyVcf)*) (16.81) 

which we could have readily obtained from ( 16.68| ) using (16.701. 

The current density j is reminiscent of the momentum density p. Recall that conservation of 
the wave-momentum for a single real field was obtained by considering the total derivative of the 
Lagrangian density with respect to position: d£/cbi, which yielded p = "V(/)dC/d(j). Applying the 
same line of reasoning for the complex field yields 


P = 


& 

8(j> 


8C 

8<j)* 


(16.82) 


or, for the Lagrangian (16.72), 


p = -|(<w* + <TW>)- 


(16.83) 

With ( |16.69| ) this is easily shown to be equivalent to the sum of the momentum densities for the 

p = — p(aVa + 6V6). (16.84) 


fields a and b: 


Now consider a plane-wave <j) = e*( Wkt_k ' x ). This has momentum density p = pcckkbd* which 
is parallel to k, as befits a wave propagating in the direction k, and the same is true for the 
wave <t>' = e -*(“kt-k x) w ] 1 j c i 1 a i so propagates in the direction k. The current for the field <f> is 
i((^>V0* — 0*V0), which is also parallel to k, but the current for the field (j)' has opposite sign. The 
same is true for wave packets; a packet with positive frequency </> oc e +lUkt has positive charge Q, 
which it transports in the direction k, giving a positive current. A negative frequency wave packet, 
with (j) oc e~ lulkt , has a negative charge and has a negative current in the direction k. 

The ‘charge’ for these quasi-monochromatic wave packets here is very similar to circular polar¬ 
ization for an electro-magnetic field. What we have called a positively charged field is one in which 
the field a lags b by 90 degrees and vice versa. If the two fields are in phase (a = b) we have a 
linearly polarized wave and the charge density and current vanish. There is a difference, however, 
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Figure 16.6: This modification to the BRS model provides a interaction between two fields a(x,t) 
and b(x,t). What we have here is two beads on rods with springs, much as in the original model, 
but with a cord attached to bead b with passes over the frictionless pulley at the upper right and 
then connects to bead b through the spring with spring constant K'. The length of the K springs, 
when relaxed, is If the relaxed length of the cord plus spring connecting the beads is 1' 0 , then, for 
small a, b the potential energy is V(a, b) = K (a 2 + b 2 )/2 + K'((2l 0 — 1' 0 ) + (a 2 + b 2 )/2l 0 + .. ,) 2 . This 
is of the form V(a, b) = pa 2 /2 + pb 2 /2 + X (a 2 + b 2 ) 2 + .... The first two terms are the usual free-held 
mass terms while the last term provides a self-interaction. If we represent the fields as (f> = a + ib 
then the potential term is V(<j>) = A|</>| 4 . This interaction respects the conservation law (16.68). 


in that the electro-magnetic held vector E ‘lives’ in the real 3-dimensional space, whereas here the 
vector <j> = (a, b) exists in a relatively abstract internal 2-dimensional space. 

For free helds, the physical meaning of all this is somewhat questionable, since, as emphasized at 
the outset, the helds a, b are completely independent of one another. The mysterious conservation 


law (16.681 is nothing more than the statement that 


cl — c 2 V 2 a b — c"V 2 & 

a C * V a = ° C y (16.85) 

a b 

i.e. the parameter p is the same in the two equations of motion. The physical implication of this kind 
of conservation law only really emerges we consider a pair of interacting helds. We can introduce 
an interaction between the a and b helds by by adding a term —V(a,b) to the Lagrangian density 
C. This in turn adds a term —dV/da to the RHS of the equation of motion for a in (16.66) and 


similarly for b. In general, this will violate the conservation law (16.681, since we add to the RHS 
a term adV/db — bdV/da. However, under certain circumstances this term will also vanish. It 
is easy to see that this will be the case if the interaction potential V is a function of a 2 + b 2 : 
i.e. V(a, b ) = F(a 2 + b 2 ), in which case adV/db = 2abF’ = bdV/da. This result is neatly expressed 
in the complex held formalism; this particular potential is equivalently V{<j>) = F(<Jxf>*), which is 
clearly invariant under the global gauge transformation (16.751, so the Lagrangian density for this 


interacting held theory is also invariant. A realization of such an interaction is illustrated in figure 

MM 

What this analysis shows is that a held theory for a pair of helds with identical mass parameter 
p can naturally accommodate two types of waves, which transport a conserved ‘charge’. These two 
types of waves are the positive and negative frequency components of the complex held (j>. This is 
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rather different from the single real field, where the negative frequency Fourier components are just 
the mirror image, with conjugation, of the positive frequency components. Here the positive and 
negative frequency components are quite independent, and positive (negative) frequency components 
carry positive (negative) ‘charge’. In the quantum theory of such a field, the corresponding quanta 
are charged particles and anti-particles. Precisely what this ‘charge’ is depends, essentially, on the 
interactions with other fields. It turns one can add an interaction with the electro-magnetic field, for 
instance, so that the charge here is electric charge. However, this is not the only possibility, and the 
conserved quantity could be some other entity. Other possibilities are to have a field with more than 
two components, and in this way one can construct theories with more complicated conservation 
laws. 


16.9 Conservation of Particle Number 

In addition to energy and the three components of the wave-momentum — which are precisely 
conserved in interacting theories — a non-interacting field possesses a fifth conserved quantity, 
which corresponds to conservation of number of particles. We will first obtain this directly from the 
form of the solutions. We then show how the conservation law can be understood as arising from a 
symmetry of the free-field Lagrangian, when written in an appropriate manner. 

The general solution of the free field equation of motion 


(j> — c 2 V 2 </> + /r 2 </> = 0 


(16.86) 


can be written as 


</>(x, t) = (j) + + cj)- 0ke i(aJkt " k x) + Y, 0£ e - i <" kt - k - x >. (16.87) 

k k 

The complex Fourier amplitudes can be related to the transform of the field and its time derivative 
on some initial time-slice t = to as 

</> k = ^ (<£(k, t 0 ) + <£(k, 0)/fw k ) e~ ,UJkt ° (16.88) 

where 

0(k, £) = j d 3 x $(x, f)e lkx and </>(k ,t) = j d 3 x </>(x, t)e ikx . (16.89) 

Now consider the quantities 

n(x, t) = *(</> + </>~ — ) (16.90) 

and 

j(x, t) = -iclW+Vcfr- - fV^+). (16.91) 

Since and (f>~ are complex conjugates of each other both n and j are real. In terms of the Fourier 
coefficients (f> k they are 

n(x,f) = E5Z^k^'( £ ^k + w k')e i((a,k "“ k,)t " (k " k ' ) - x) (16.92) 

k k' 

and 

j(x, t) = c 3 YY <^'(k + k ')M k - k ')- x ). (16.93) 

k k ; 

The time derivative of n(x, t) is 

|=*EE </*<& - u£0e <(( ^-‘‘ v)t - (k - k ' ) - x) 

k k' 


(16.94) 
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whereas the divergence of j(x,f) is 

V • j = -ic^EWMk 2 - k ,2 )e i «“ k - < ^' )t - (k - k ')- x) . (16.95) 

k k' 


However, the frequency is defined to be = c 2 k 2 + p 2 , so we have 


dn 

~dt 


-Vj. 


(16.96) 


Thus we have a conservation law, relating the density (16.901 and corresponding current ( 16.91| |. 
For a single plane wave, with amplitude <f> o, the density and current are 


n = 

j = c 2 4>l k 


(16.97) 


so (c s n, j) form a four-vector (the amplitude of a wave being an invariant also). Note that the current 
in this case is just equal to the density times the group velocity: j = v g n. 

The same is true for a large wave-packet, for which the global conserved quantity is 


N = 


1 


d 3 x n(x, t ) = Wk 


/ 


d 3 x (f 2 . 


(16.98) 


As already discussed, this quantity corresponds to the particle number. 

Another interesting model is a statistically homogeneous random field, for which (</>k</>k') = 
P^(k)(5kk' where P ^ is the power spectrum. This is working in a periodic box, where the Fourier 
modes are discrete. If we think of the modes as being continuuous, this becomes (^k^k') = 
(27r) 3 P0(k)<5(k — k'). The expectation values of the density and current are then 

(n) = E(l0k| 2 K-/(0P^(k)o;k 

(j) = c l E(l^k| 2 )k -► / (03P^,(k)k (16.") 

k 


The quantity Lo^P^ik) plays the role, in wave mechanics, of the phase space density /(p) in 
particle dynamics. Like /(p) it is invariant under boosts (see below). Integrating over all momenta 
- i.e. all values of wave-number k - gives the particle number density; multiplying by k and 
integrating gives the momentum density and multiplying by Wk and integrating gives the energy 
density. 

The conservation law (16.961 was obtained directly from the properties of the solution of the 
field equations. It is also interesting to see how this emerges as a consequence of a symmetry of the 
Lagrangian. With </> = </> + + </>“ the Lagrangian density is 


£ = f(W > + ) 2 - c 2 (v <(> + ) 2 - fi 2 <j> +2 )+ 

p{(t) + j>~ - c 2 V(j) + ■ Vcf)- - p 2 (/)+(/)-)+ (16.100) 

f((r) 2 - c f(vr) 2 -M 2 r 2 )- 

If we vary either (f> + or (j)~ we obtain the same equation of motion: 

0+ - c 2 V 2 </>+ - p 2 (j) + + 4>~ - c 2 V 2 </>" - p 2 (t)- = 0. (16.101) 


This is just the equation of motion for <j>, with (j) —■» <t> + + (j) . However, consider the Lagrangian 
density 

c! = p(^+<r - c 2 V</> + • V<t>~ - p 2 (/) + (f>-). (16.102) 

Varying </> + and 4>~ now yield the field equations 

(f) + — c 2 X/ 2 (j) + — p 2 (j) + = 0 

and (16.103) 

<j>- — c 2 \/ 2 (j)- — p 2 4>- = o. 
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But cj) + and (f>~ are complex conjugates, so any field 0 + (x, t) which is a solution of the field equations 
derived from £ is automatically a solution of the field equations derived from C. Thus, we can 
take the Lagrangian density for the system to be £ given by (16.102), since it generates equivalent 
solutions. Now this Lagrangian, like that for <j) has no explicity time or position dependence, so energy 
and the three components of momentum are all conserved. However, ( 16.102|) has an additional 


symmetry: it is invariant under the global gauge transformation <A 


and 4> 


This transformation corresponds to variation of the choice of initial time slice (t = 0) on which we 
determine <p + and cf>~. The conservation law (16.96) can readily be obtained by requiring that the 
total derivative of the Lagrangian density with respect to 6 vanish, just as we did to obtain charge 
conservation for the complex field. 

It should not come as a surprise that the free field has more conserved quantities than just the 
energy and momentum density. The Fourier amplitudes (f >k can be determined from the displacement 
and velocity data </>(x,f), c/>(x,f) on any time slice, so there are in effect a triply infinite number 
of conserved quantities. However, this behaviour is specific to the free field. For the free field, 
the 4-dimensional Fourier modes are confined to the 3-surface u;£ = c 2 k 2 + p 2 , so the information 
content in the 4-dimensional field </>(x, t) is really only 3-dimensional. If we admit interactions, this 
will no longer be the case. A necessary consequence of any interactions is that the energy shell will 
‘fuzz-out’ somewhat, and it is no longer possible to determine the entire space-time behaviour from 
measurements on one time slice. 

Consider adding an interaction term Ant = p\cj) A to the free-held Lagrangian. In terms of <fi + , 
d>~ this is 


Ant = M 4 ((</»+) 4 +4(^+) 3 </»-+6(</>+) 2 (<n 2 +4</>+(<r ) 3 + (<n 4 )- (16.104) 

However, only the central term here respects the global gauge transformation symmetry, so adding 
interactions therefore violates conservation of particle number. 


16.10 Particle Number Conservation at Low Energies 

We saw in the previous section that in the wave mechanics of a continuous medium there are four 
exactly conserved quantities — the energy and the three components of the wave-momentum 
but that particle number is only conserved for non-interacting fields. Now particle number is also 
violated in interactions of high energy particles; if we collide very energetic (i.e. highly relativistic) 
particles then there is ample energy to create new particles that were not present in the initial state. 
However, if we collide particles with kinetic energy E <C moC 2 , then there is not enough energy to 
overcome the threshold to produce new particles, so particle number is conserved at low energies. 
We will now see how particle conservation is conserved in a classical interacting field theory, in the 
limit that the wavelength is large compared to A* = 27r/fc* = 2ttc s //i. For the BRS model this 
length scale plays the role of the Compton wavelength, so the condition A > A* corresponds to 
the highly non-relativistic limit. In essence, what happens is that in the limit k <C A the field 
c j> has rapid temporal oscillation at frequency to ~ p with a relatively slowly varying ‘envelope’. 
The evolution of this envelope, it turns out, is governed by an equation which is very similar to 
the Schroedinger equation. The correspondence principle, which states that non-relativistic particle 
dynamics phenomena can be equally well described by Schroedinger’s wave mechanics — then carries 
over, with a little modification, to the scalar elasticity waves. 

Consider first non-interacting waves (A = 0), for which the wave equation is 


0~c 2 V 2 (f> + /j, 2 (/> = O. (16.105) 

In the limit we are considering, i.e. c s k <C p, the second term is much smaller in magnitude than 
the third, so the first and third terms must be nearly equal. If we neglect the second term entirely, 
the equation of motion is <f> + p 2 cf> = 0, the solutions of which are </>(x,i) = 0 (x) exp (Apt): i.e. the 
field just sits there without moving, aside from wiggling up and down at frequency u> = p. What 
we would lie to do is to divide out the rapid time oscillation and develop an equation of motion for 
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the relatively slowly varying modulating function. This is a little tricky since a general p field will 
contain both positive and negative frequencies: p = p + + p~ as in (16.871. If the spatial frequency 
of the waves is k < fc max , then the positive energy part of the wave is limited to a narrow band 
of spatial frequencies g < ui < p\J 1 + c 2 k^ ax /p 2 , and similarly for the negative frequency modes. 
Thus, we can let 

p{x.,t) = ip(x, t)e lllt + ip*{x, t)e~ ilzt (16.106) 

where p(x,t) and its conjugate are band-limited with temporal frequencies in the range 0 < to < 
c 2 k max/^M- The free field Lagrangian density is then 


C=%(P 2 -c 2 s (Vp) 2 -p 2 p 2 ) 


_ P p 2i^t 


[{ip + ipp) 2 — c 2 (S7p) 2 — m 2 ^ 2 ] 


+p[(V’ + ipp)(P* — ipip*) — c 2 V-0 • — p 2 pp*] 

+ fe- 2i ^[(V>* - ^'0*) 2 - c 2 (W’*) 2 - mV 2 ], 


(16.107) 


and for a A<(> 4 interaction term this is augmented with 

Ant = —p\p A = -p\{p A e Ai ^ + Ap 3 p*e 2i ^ + 6^ 2 ^* 2 + T^V 2 ^ + ^*V 4i/it )- (16.108) 


Thus the transformation (16.1061 has resulted in an explicit time dependence in the Lagrangian, 
with terms containing rapidly oscillating factors and e ±4jAlt . However — and this is a key 

point — if we are dealing with low spatial frequency waves k < /c max , so the temporal frequency for 
the p field is uj < k‘^ nax /2p <C p, the contribution to the action S = f dt J d 3 x L from these terms is 
very small since all the other factors are relatively very slowly varying. Thus, for k -C p/c s we can 
ignore most of the terms above and use the effective Lagrangian 


C = p[ip' p* + ipppip* — p*p) — • V?/>* — 6Xp 2 p* 2 ]. 


(16.109) 


This can be further simplified, since, in the regime we are considering, p -C pip so we can neglect 
the first term as compared to the second. We can therefore take the Lagrangian to be 


C = p[ip(pp* — ip*ip) — c 2 Vi/’ • — 6A^> 2 ^>* 2 ]. 


(16.110) 


The equation of motion is obtained, as always, by requiring that the action be stationary with 
respect to variation of the fields ip, ip*. For the variation ip* —> p* + Sip* this yields 


d(d£/dp*) d(dC/dip* j ) 


dt 


dxi 


- dC/dp* = 0 


(16.111) 


or equivalently 


„2 RX 

ip - —\/ 2 p + —p 2 p* = 0 

2p p 


(16.112) 


and the variation ip —> ip + Sip yields the complex congugate of ( 16.112| ). This is very similar in 
form to the time dependent Schroedinger equation for the wave-function p(x, t ) for a particle in a 
potential, but with the dynamical quantity 6A ip 2 ip*/p in place of the potential V (x). We will discuss 
this connection further below. 


The Lagrangian (16.1101, like (16.102), has no explicit dependence on t or on x, so energy and 
momentum are conserved, and it is also symmetric under the global gauge transformation ip —> ipe zS 
and ip* —> tp*e ~ l9 , so it therefore also has a conserved particle number. 

Conservation of particle number follows from the vanishing of the total derivative of the 
Lagrangian with respect to the global gauge transformation parameter: 


dC dC dip dC dip dC dipj , ; k 

~do = dp~dd + dp~de + Hp~~df + '"^^^ 


(16.113) 
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Using the Euler-Lagrange equation to replace dC/dip by d(dC/dip)/dt + d(dC / dip y)/ dx 1 and with 
dip/dd = iip, dip/d9 = iip etc this becomes 

d (DC , dC 

l— - rip -— 

dt \ dip dip* 
or 


where the particle number density is 
and the particle flux density is 


Conservation of momentum is obtained from the total spatial derivative of the Lagrangian 
density: 

dC dC dip dC dip dC dipj 


A = -i d . %- dc A 

J dx 1 y dip t i v dip*i J 

(16.114) 

dn _ . 

a=- Vj 

(16.115) 

n = 2 ppipip* 

(16.116) 

c 2 s (ip*Vip- ipVip*). 

(16.117) 


dx i dip dx * dip dx i dipj dx 1 


+ ...ip —> ip ... 


(16.118) 


Using the Euler-Lagrange equation as above and using the commutation property of partial deriva¬ 
tives this becomes 


d 


dC 

- rip i -— Ip * = — 


dt \ dip 


dC 
dip* 


d 

dxi 


or 


dp 1 
~dt 


V 

dw ij 

dxi 


dC , dC ' 

1p,i - WJ^1p,i + SrjC 


dip* 


(16.119) 


(16.120) 

(16.121) 


where the momentum density is 

p = ipn(ip*'Vip — V’Vi/’*) 

and the momentum flux tensor is 

w 13 = p(? a (ip,iip*j + ipjipj - 6 i:j [V 2 {ipip*) - ^ip^ip* 2 }). 

c s 

Conservation of energy is obtained from the total time derivative of the Lagrangian density: 
dC dC dip dC dip dC dip ^ 


(16.122) 


dt dip dt dip dt dipj dt 


+ ... ip —> ip* ... 


(16.123) 


Again, using the Euler-Lagrange equations and the commutation property of partial derivatives, this 
becomes 


d (dc • dc u \ d dc ■ dc 

tt ( —■ i’+— '!’*-£) =-— I —V’+w-rV’* 


dt \ dip dip* 


dxi l g-t/j j dip* 


(16.124) 


Using the equations of motion to replace iip by c 2 V 2 ip/2p — 6A ip 2 ip* j p and similarly for iip* this 
becomes 

^ = - V • F (16.125) 


where the energy density is 
and the energy flux vector is 


e = p{c 2 Vip ■ Vip* + 6A ip 2 ip* 2 ) 


(16.126) 
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To summarize, the conservation laws are 

number : dn/dt = — V • j 

momentum : dp 1 /dt = —dw^/dx^ (16.128) 

energy : de/dt = V F 

where 

n = 2ppipip* 

j = i pc 2 (ip* "V ip — i(Vf) 
p = ipp(ip*Vip ~ ipVip*) 

w lJ = pc^ip^ipL + ip^ipj - S xJ [V 2 (ipip*) - ptip 2 ip* 2 ]) (16.129) 

e = p(c 2 Vip ■ Vip* + 6Xip 2 ip* 2 ) 

F = i^[|(VfV 2 )) - VipS7 2 ip*) + ^ipip*(ip*Vip - ipVip*)] 

There is a close parallel here with Schroedinger’s development of quantum mechanics, where the 
state for a particle is represented by a ‘wave-function’ i/)(x, t). The density n is, aside from a constant, 
the probability density. The form of the particle current is also familiar from non-relativistic quantum 
theory. The globally conserved quantities are 

N = / d 3 x n = 2 pp J d 3 x ip*ip 

P = J d 3 x p = ipp f d 3 x (ip*’Vip — V’VV-*) = 2 pp J d 3 x ip*iWip (16.130) 

E = f d 3 x e = pc 2 J d 3 x Vt/’* ■ V 0 = 2 pp f d 3 x ip* V’ 

where in the last equation we have neglected the contribution to the energy density from the in¬ 
teraction. If ip were a wave-function, then second line above would be the expectation value of 
the momentum operator p oc «V in the state ip. The last line, similarly, is proportional to the 
expectation value of the Hamiltonian operator H = p 2 /2m. The low-energy limit of the free-field 
scalar-elasticity equation of motion is exactly equivalent to the Schroedinger equation for a free- 
particle. For the interacting field, the equation of motion is formally similar to the Schroedinger 
equation for a particle in a potential V = 6\ip*ip/p. However, this analogy should not be taken 
too far, since this potential is both time and position dependent, which properties generally destroy 
energy and momentum conservation. 


16.11 Ideal Fluid Limit of Wave Mechanics 

As an example of the power of the energy and momentum conservation laws, consider a 3-dimensional 
field (ppK , t) of the type we have been discussing. Let us assume that the field has a self-interaction, 
let’s say a A</> 4 interaction term, but which is sufficiently weak that the waves can be locally approx¬ 
imated as a sum of free-held traveling waves. Now imagine we inject some wave energy into this 
system in some arbitrary and inhomogeneous manner. This is analogous to throwing a brick into a 
swimming pool; we may initially have an organized disturbance, but after a little time interactions 
between the waves and the walls of the pool will randomly distribute the energy among the various 
modes, and the result is a Gaussian random wave Held. Such a held can be realized by summing 
waves with random complex amplitudes <p( k) (or random real amplitudes and random phases) and 
is fully characterized by the power spectrum P^(k) = (|0k| 2 )- Here the interactions between wave 
modes will similarly effectively randomize the phases and amplitudes of the waves, and the system 
will relax to a locally homogeneous state (the situation here differs slightly from the swimming pool 
example in that there the interactions with the wall do not respect wave momentum conservation, 
so the power spectrum for waves in the pool becomes isotropic). 

Now while wave interactions can efficiently locally homogenize the wave energy spectrum, large 
scale inhomogeneities in the energy and momentum distribution will take a very long time to be 
erased. This is very similar to the situation in gas dynamics, where collisions can rapidly render the 
velocity distribution locally Maxwellian, but where large scale inhomogeneities — sound waves for 
instance — take a very long time to damp out. What we will develop here is a system of equations 
which describe the spatio-temporal evolution of the wave energy and momentum. These turn out to 
be identical to the ideal fluid equations for a collisional gas of relativistic particles. 
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16.11.1 Local Average Stress-Energy Tensor 

We have already obtained the energy and momentum densities e = (j>dC/d<f>—C and p = —fydC/dX/cj), 
which for the system considered here become 

e= + (V</>) 2 + (16.131) 

and 

p = -pj>V4>. (16.132) 

These arc fluctuating quantities, with coherence length- and time-scales ~ 1/k and l/cc k respectively. 
Let’s average the energy and momentum densities and flux densities in some region of space-time of 
size L^> 1/k and duration T ~A> 1 /u>k- The average energy density is 

(e) = £ (V > + <2((V<*) 2 > + m 2 ^ 2 )) . (16.133) 

Each of these expectation values can be written as an integral over wave-number involving the power 
spectrum. For example, (^ 2 ) = (2tt)~ 3 J d 3 k w^P^(k), and therefore 

( e ) = £ J p^(w^ + c 2 fc 2 + p 2 )P ( / > (k) (16.134) 

but the waves are assumed to obey the free-field dispersion relation to a very good approximation, 
so the term in parentheses is just twice and the mean energy density is 


V) = P 


d 3 k 

(27t) 3 


k). 


Similarly, the mean momentum density is 

d £. 


( p ) = = -P(^V0) = p 

o<p 


d 3 k 

JTk) 3 


iu k kP0(k). 


The energy flux is F t = cpdC./d<pp, so the mean energy flux is 

dC 


(F) = {ct>—) = — pci((/>V <j>) = pci 

Finally, the average momentum flux tensor is 


d 3 k 
(27 r) 3 


uj k kP c/> (k). 


O r 

( Wi j) = (SijC - ~ c ^4>f _ to2 ^ 2 ) + P c2 M,Ad)- 

The first term vanishes by virtue of the dispersion relation, so 

d 3 k 


/ d 3 k 

j—^kikjP^k). 


(16.135) 


(16.136) 


(16.137) 


(16.138) 


(16.139) 


The contribution to the energy density from wave modes in d 3 k is d 3 e = pd 3 kuj' k P < p(k)/(2n) 3 , 
so the quantities here are energy weighted averages. The energy flux density, for example, is the 
average of c 2 k/cc k . But c 2 k/cj k = v g (k), the group velocity for waves with wave-number k, so (F) is 
(e) times the energy weighted average group velocity. Similarly, the average momentum flux tensor 
(wij) is (e) times the energy weighted group velocity dispersion tensor. 

We can combine the various factors in the 4x4 matrix 


rpl_LV _ 


c s p 



VJij 


d 3 k 

J2^f 


P^k)k^k v 


(16.140) 
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where the four-momentum (or four-wave-number) is k = (wk/c s , k). 

Now recall that solutions of the wave equations satisfy a covariance under Lorentz-like boost 
transformations. If we have a solution <p(x) = </>(x, t) then we can generate a 3-dimensional family 
of solutions 4>'{x) = cf>(Ax) where A is the transformation matrix for a boost, parameterized by the 
boost velocity v = f3c s . The transformation for a boost along the £-axis is a shearing in the x — t 
plane. Now d 3 kP ( j > (k.)/(2n) 3 is the contribution to the variance (<fi 2 ) from the region d 3 k about k. 
Since the variance is invariant under boosts (since <j> is invariant) so also is d 3 kPcf,( k). This means 
that the matrix T**" is in fact a 4-tensor ; i.e. its components transform like k^k" under a boost. It 
is called the stress energy tensor. 

This tensor takes a particularly simple form if we apply a boost such that the momentum flux 
density for the transformed field vanishes. If there is no net momentum flux, the effect of collisions 
is to produce a spherically symmetric, or isotropic, power spectrum P^(k) = P^(k), and the stress- 
energy tensor is therefore 

eo 


rj-ifll' _ 


P 


(16.141) 


P 


where eo is the energy density of the transformed field, and we have used 




(2tt) 


IS 


( 2 tt) 


\pm 


= e 0 (\v g \ 2 )6 ij /3c 2 s 


(16.142) 


to define the pressure 

P = £ o(l v g | 2 )/3c 2 . (16.143) 

The pressure is therefore essentially the energy weighted mean square group velocity. 

The stress-energy tensor for the actual field can then be obtained by applying the boost trans¬ 
formation matrix to (16.1411. For a boost along the z-axis, for instance, 

7 2 (eo 


rj-ifll/ _ 


I3 2 P) 

/37 2 (eo + P) 


/?7 2 (eo - 
7 2 (/3 2 eo 


P) 

-P) 


P 


P 


(16.144) 


and we can then read off the components of the energy density, momentum density, momentum 
flux etc. The actual energy density is e = 7 2 (eo + (3 2 P) and the actual momentum density is 
c s p = (3 7 2 (eo + P). The latter can also be written as c s p = (3{e + P). Thus given some actual 
T^ u we can determine the parameters (3, eg and P as follows: 1) Find the 3-rotation matrix that 
aligns the momentum p with the rr-axis. 2) Read off the pressure P from either T 22 or T 33 . 3) 
Compute the dimensionless velocity /3 = c s p/(e + P). This also provides 7 = l/^/l + f3 2 . 4) Solve 
e = 7 2 (<7> + (3 2 P) for e 0 . 

An alternatively, but particularly useful form for the stress-energy tensor can be obtained if we 
define the 4-velocity u = (yc s ,yv). This is exactly analogous to the 4-velocity in special relativity, 
but with the asymptotic sound speed c s in place of the speed of light, and has all the familiar 
properties: e.g. u ■ u = u^Uu = — c 2 , where u „ = with the usual Minkowski metric. In 

terms of u, we can write 

T^ = (e 0 + P)u»u v /c 2 s + p^P. (16.145) 


It is easy to check that this agrees with (16.1441 for the particular case v = (/3c s ,0,0). However, e 0 
and P are the energy density and pressure as calculated under the specific boost transformation that 
annuls the momentum density, and they therefore transform as scalars under boosts, so consequently 
(16.1451 is a 4-tensor equation and is therefore valid for arbitrary boost velocity u. 

To summarize, we first expressed the average stress energy tensor in terms of the power spectrum 
of the waves (16.1401. We then obtained an alternative, but equivalent, expression (16.1451 in terms 
of the five parameters eo, P and the three components of the velocity boost u needed to transform 
from the zero momentum field to the actual field. 
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16.11.2 Evolution Equations 

So far we have imagined this sea of waves to be perfectly homogeneous. Now imagine that the 
sea has been rendered locally statistically homogeneous by the wave interactions, but that there 
is some persistent large scale inhomogeneity. We can compute the evolution of such macroscopic 
inhomogeneities by taking the local spatial average of the conservation laws e = — V • F and p = 
—V-w. These can be succinctly combined as the time and space components of the 4-vector equation 


= 0. (16.146) 

While succinct, this rather obscures the physical content. To reveal this, note that the time compo¬ 
nent of this set of equations (obtained by setting fj, = 0) is, from (16.1451 


° = ciT ° % = 9^ [(eo + P >° uV + Wl = £* + P >° uV \ (16.147) 


d_ 

dx v 


dP 

~dt 


where we have used rj^ = diag(—1,1,1,1) and x = (Cgt,*.). The space components are similarly 
obtained by setting [i = v. 


0 = c 2 X\ = A [(eo + P) U V + clP tT] 

r) r)P 

= ^0 + p )u° ul 'F}+c 2 s — i 


’ dx l 


(16.148) 


id id/l* 

= ^t [(eo+p)wv]+(eo+p)Mv ^ 


,dP 
1 dx l 


where, to obtain the second line we have used u l = u°/3 l and rf v = 8 W , and in the last step we have 
simply used the rule for differentiating a product. Comparing with ( |16.147l we see that the first 
term on the right hand side can be written as fi l c s dPldt = v l dP/dt and (16.1481 becomes 


(e 0 + P)iu l 


dv l 

dx v 


,dP 


; dP 


Cs dx i dt~ 


(16.149) 


But u u d/dx v = 7 (d/dt + (v ■ V)) and therefore this becomes the vector equation 


idv c 2 

— + (v • V)v =-- 

&t K ’ 7 2 ( e o + P) 


dP 

Vp+v i}t 


(16.150) 


where we recognize, on the left hand side, the convective derivative of v, the local rate of change 
of v as seen by an observer moving at velocity v (i.e. an observer in who’s frame the momentum 
density vanishes). Equation (16.1501 is the relativistic form of the Euler equation. 

A more useful form for the fourth component of the conservation laws is obtained if we dot 
with idb Multiplying by c 2 , this gives 

0 = + P K „1 + 


c% ' dx v 
u^ d(e 0 + P)u v 


dx v 

, u v un du» „ dP 

+ ( '“ + P) W i 5^ + "<^ 


(16.151) 


- a(£0 + PK I 0 I u^ P 

dx v dx v 

„ dtp 

C ° + P dx v U dx v 

where we have used u^du^/dx v = (1/2)cd (u ■ u)/dx 1 ' = 0. This then provides an expression for the 
convective derivative of the energy density: 


<9eo , / -.6 d) 

s + < v ' v )'» = — 


dj 

dt 


+ V • (yv) 


7 


(16.152) 
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Equations (16.152) and (16.150) take a particularly simple form in the vicinity of a point where 
the momentum density, and therefore also v, vanishes, since we can then take 7 = 1 to obtain 


de 0 
dt 


— (e 0 + P)V • v 


(16.153) 


and 


dv _ VP 
dt cq T P 


(16.154) 


These are in fact identical to the relativistically correct energy equation and Euler equation for a 
collisional gas. They provide four equations for five unknowns eo, P and u. To close this system of 
equations we need an equation of state ; a rule giving the pressure as a function of eo- In the following 
sections we find this relation for the two limiting cases of high spatial frequency waves with k fc*, 
corresponding to the highly relativistic gas, and the opposite case k <C fc*, which corresponds to the 
non-relativistic gas. 


High Frequency Waves 

It is very easy to find the equation of state in the limit that the sound waves have very high spatial 
frequency |k| n/c s . In this case the field is effectively massless and the group velocity is |v g | = c s 
for essentially all k and therefore (|v 9 | 2 ) = c 2 and therefore P = eo/3. This is the same as the 
equation of state for a gas in which the particles are highly relativistic. 

The energy equation, in the vicinity of a point where v = 0, in this limit says 

< ^ ) =-^eoV-v. (16.155) 

Now V • v = V~ 1 dV/dt , with V the volume, so this tells us that the energy, and hence also the 
pressure, changes inversely as the 4/3 power of the volume: PE 4 / 3 = constant. 

The general energy and Euler equations are then 


4 e 0 

e o — -q — 
3 7 


g + V-hv) 


and 


47 2 e 0 


Y7 . de ° 

Ve 0 + v — 
dt 


(16.156) 


(16.157) 


These provide a coupled, but closed, system of 4 equations for the four functions, or fields, eo(x, i), 
v(x, t) (note that 7 = 1 /-y/l — u 2 /c 2 is not an independent variable). If these fields, and their 


derivatives, are known on one time slice, then (16.156) and (16.157) allow one to advance the fields 


to the next time slice and in this wave determine the future evolution from arbitrary initial conditions. 

We started with a field equation for </(x,t), and a rather complicated one at that, with a self¬ 
interacting field. We finished with a system of 4 deterministic equations for the scalar eo(x,t) and 
the vector field v(x,t). The significance of the latter is that v is the boost transformation which 
locally annuls the momentum density, and eo is the energy density of the transformed field. Once 


we have the solution, we can then determine the stress energy tensor T M 1 / (x,t) using (16.145). 


These equations, like the equations of ideal fluid dynamics, are non-linear. For small perturba¬ 
tions about a uniform energy density state, we can linearize and let eo = e + ei, and v = 0 + Vi, 
where quantities with subscript 1 are assumed to be small perturbations: <C e and Vi <C c s . The 
latter means we can set 7=1 and ignore v • V compared to d/dt in the convective derivative. In 
this limit the energy and Euler equations become 



dei 

dt 


C, 

(16.158) 

eV ■ v 1 

(16.159) 


and 
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or, on taking the divergence of (16.158) and the time derivative of (16.159) and eliminating the 
velocity, 


<9 2 ei 

dt' 2 



(16.160) 


This is a wave equation, with wave speed c s /\] 3. These are very different from the waves of the 
underlying </> field; these are waves of wave energy density. This equation, for the evolution of 
self-interacting sound waves in our solid state lattice system is exactly analogous to the equation 
governing acoustic waves in a radiation dominated plasma. 


Low-Frequency Waves 


Now let’s consider the opposite limit where k <C fc*. As discussed in §16.10| we then have, in addition 
to energy and momentum, a fifth approximate conservation law. These five conservation laws were 


given in (16.128), along with the constituent relations (16.1291. The latter can be simplfied in the 


case that the interaction coupling constant is large enough that the interactions render the power- 
spectrum isotropic, but small enough that the interaction energy \cj) A is negligible compared to the 


other contributions to the energy. One can then drop the terms in (16.1291 which involve A. 


Proceding much as before, we can assume that the interactions will render the wave-field as 
a locally homogeneous Gaussian random held, and we can compute quantities such as the energy 
density, particel density in terms of the power spectrum. One can then show that the constervation 


laws (16.128) reduce to the continuity, Euler and energy equations. 


16.12 Discussion 

We have shown here how one can treat continuous media using the Lagrangian formalism. We started 
with a simple ‘solid state’ system consisting of a lattice of coupled oscillators and determined the 
discrete set of normal modes, which are the discrete Fourier transform of the displacements <pj. We 
then took the continuum limit, and showed that the action is the 2-dimensional space-time integral 
the Lagrangian density £(</>, V</>, cfi), which is a quadratic function of the field and its derivatives. 
The condition SS = 0 generated the equation of motion of the system in the form of a 2nd order 
differential equation in space and time. The normal modes are again Fourier modes (continuous 
Fourier modes now). The generalization to 2 or 3 spatial dimensions is straightforward. 

We then showed that the symmetry of the Lagrangian density under spatial translations gave 
rise to conservation of wave momentum. Applying this to non-interacting wave packets we found 
that energy and momentum are related by P/E = k/wk- We also investigated multiple, possi¬ 
bly coupled, fields. We saw that a two component field can have additionally conservation laws 
which look suspiciously like conservation of electric charge, or some other conserved quantity, and 
we showed that low-energy (i.e. non-relativistic) waves display a futher conservation analogous to 
particle conservation. Finally, we explored in some detail, the behaviour of interacting fields with 
interactions sufficiently weak that they do not substantially modify the energy of the system, but 
sufficiently strong that they frustrate the propagation of wave packets over large distances. We 
found that the energy and momentum density in such self-interacting wave systems obey a closed 
system of equations which are precisely like those obeyed by relativistic collisional gas of pointlike 
particles, in the ideal fluid limit (see e.g. Weinberg’s book). What we have seen here is a classical 
wave-particle correspondence principle ; we can think of an ideal fluid as a gas of classical particles 
which collide with each other or as a field of self-interacting waves. The two models for the ‘underly¬ 
ing real system’ give identical results for the laws governing the macro-scopic propagation of energy 
and momentum. 

The underlying discrete lattice model was chosen so that the field equation, dispersion relation, 
wave-packet energy momentum relation etc. mimic the properties of the relativistic massive scalar 
field (which we will introduce in chapter [i8| . The mechanical system introduced here, and explored 
quantum mechanically in the following chapter, therefore provides a useful, concrete and conceptually 
less challenging analog of the more abstract relativistic scalar field. In employing this analogy one 
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should be careful how to interpret the field. In the mechanical model, the field happens to be a 
spatial displacement. However, the direction of the field was arbitrary; In the 1-dimensional model 
we drew the beads oscillating vertically, but they could equally well have been drawn oscillating 
horizontally and either transverse or parallel to the lattice. In any of these cases the Lagrangian 
density is the same. Similarly, in 2- or 3-dimensions, the direction of motion was quite arbitrary. 
This is quite different from waves in real elastic solids, where the displacement is a vector. The 
model we have constructed is for a scalar disturbance; the displacement is best thought of as lying 
in an abstract 1-dimensional internal space. What we have called the microscopic momentum — the 
sum of the conjugate momenta — ‘lives’ in this abstract space. The wave-momentum, in contrast, 
is a true vector quantity in the real space, and which transforms in the usual way under spatial 
rotations, for instance. 

The invariance of such classical field theories under what are formally identical to Lorentz trans¬ 
formations is rather interesting, and encourages a rather non-standard view of relativistic invariance. 
As already emphasised, the system here is not truly relativistically covariant; if we imagine a two- 
dimensional system, then if we observe the propagation of waves on this surface using photons (or 
other particles with a propagation speed much greater than c s , the asymptotic velocity for high 
frequency elasticity waves) then we can readily infer the frame of rest of the underling lattice; it 
is that frame for which the dispersion relation is independent of wave-momentum direction. Using 
photons, one can, of you like, measure the aether drift for these elasticity waves. However, the 
wave-systems considered here allow quite rich behaviour, encompassing analogs of relativistic and 
non-relativistic particles, and interactions between waves allow quite complex solutions to develop. 
What if we imagine some being, or perhaps something like a cellular automaton, which lives in this 
2 -dimensional ‘plani-verse’ and is constructed out of these fields (just as we imagine ourselves to be, 
in reality, complex solutions of a rather simple set of fundamental field equations) — can such an 
entity measure the aether drift? The answer is that there is no observation that this entity can make 
which can reveal the underlying lattice rest-frame, provided all the fields in the system have identical 
asymptotic sound speed c s . The fields may have quite different ‘base-spring’ strengths (i.e. different 
masses) and the base-spings may be non-linear or may have pointlike interactions with other fields, 
but provided the bead-mass and connecting-spring constants give the same asymptotic sound speed, 
the system is, internally, fully covariant and any aether drift measurements are doomed to failure. 
If on the other hand, the connecting spring constants and bead masses give different c s vales for the 
different fields, then the Lorentz-like transformations which preserve the field equations depend on 
the particular held; the transformation under which one held remains a solution, does not preserve 
the held equations for other helds. 

Thus, it would seem to be quite consistent to believe that the world we inhabit is ‘in reality’ a 
Galilean system, with absolute time etc., but that all of the helds in our universe happen to have 
the same asymptotic propagation speed, so the world ‘appears’ to be Lorentz invariant. If the world 
is just helds, and if all of the helds have identical c s , then this is meta-physics; if we cannot detect 
the aether, then we should not introduce it in our physical description of the Universe. However, if 
we were ever to be able to demonstrate the existence of say ‘tachyonic’ behaviour; e.g. apparently 
causal correlations between the held values at points with space-like separations, then this could be 
readily be incorporated in classical held theory, simply by introducing helds with different c s values. 

Finally, there is an important inadequacy in the interacting held theories we have developed. 
We were only able to establish the wave-particle correpondence principle in the limits of highly 
relativistic or highly non-relativistic waves. In the particle description one can treat collisional 
gases at all energies, if we introduce the additional requirement that the gas be locally in thermal 
equilibrium. This extra ingredient provides the distribution function for particle momenta, and 
provides a general equation of state. Only in the limits we have considered are the evolutionary 
equations for the wave energy independent of the details of the momentum distribution. If we try 
to introduce thermal equilibrium in a classical wave system we immediately run into problems; the 
equilibrium state should have equal energy per oscillation mode, and this is counter to observations. 
The resolution, of course, is that the world is not classical, but quantum mechanical, and the energy 
states of the modes are not continuous but are discretized in units of hut. In the next chapter we 
will develop the proper quantum mechanical description of wave systems. 



Chapter 17 

Quantum Fields 


Hamiltonian dynamics provides a direct path from the classical to the quantum description of a 
system. The canonical quantization procedure consists of replacing the position and momentum 
variables q and p with non-commuting operators which act on quantum mechanical states. For the 
discrete lattice model these operators are the displacements <pj and momenta pj = dL/d(/)j = Mcpj. 


17.1 The Simple Harmonic Oscillator 

Consider a single simple harmonic oscillator, with Hamiltonian 

H = p 2 /2m + kx 2 /2. (17.1) 

One approach to this is to solve Schroedinger’s equation Eip = Hip (with the usual substitutions 
E —> iTid/dt and p —> — iTid/dx ) to obtain the wave-function ip. This gives the well known result 
that the energy levels are quantized with 

E n = (n + l/2)huj (17.2) 

with u) = \Jk/m . There is, however, a different approach using creation and destruction operators, 
which proves to be of great value in field theory. 

These operators are defined to be 


a = . I (mux + ip) 

v 2Timid 

aP = , } (mux — ip) 

V2hmu ) v 


(17.3) 


If x and p were classical variables then their product aaP would be H/Tilu. But x and p are really 
non-commuting operators obeying the commutation relation 


[a :,p\ = xp — px = ih. 

Now the two possible products of a and cd are 


aa 1 = 
aP a = 


2 l ( m 2 ui 2 x 2 — imuj(xp — px) + p 2 ) 
2 ^— ( m 2 Lo 2 x 2 + imiv{xp — px) + p 2 ) 


from which follow two important results 


Subtracting these and using (17.41 gives the commutator of a and a' 


[a, cd] = 1. 


(17.4) 


(17.5) 


(17.6) 
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• Adding these gets rid of the cross terms and we find that the Hamiltonian is proportional to 
the symmetrized product of a and a'\ 


H = —(aa 1 + a 1 a) (17.7) 

or as H = ?iL>j{a, a' }/2 where the ‘anti-commutator’ is defined as {a, a^} s acd + a^a. 

Why are these called creation and destruction operators (sometimes ‘ladder operators’)? To see 
this let’s apply the operator H to the state \aE) = a\E) where | E) is the energy eigenstate with 
energy E (i.e. if 12?) = E\E)) 


H\aE) = Ha\E) = ^-(aa^a + a^aa)\E) 

= ^{a{aa) ~l) + {aa)-l)a)\E) (17.8) 

= a{H — fi(jj)\E) = {E — hui)a\E) = {E — hui)\aE) 


where we have used the commutation relation (17.61 to replace <da by aa' — 1. Evidently the state 
|ai?) is also an energy eigenstate, but with energy level one quantum lower than the state | 1 ?), 
so effectively the operator a has destroyed one quantum of energy. Identical reasoning shows that 
H\a'E) = {E + huj)\aJE) so a) has the effect of increasing the energy by one quantum. 

These result imply that the energy eigenstates are also eigenstates of the operator N = a 1 a since 
this first lowers the energy but then raises it again. What are the eigenvalues n of this operator? 
The eigenvalue equation is iV|n) = n\n). Now using (17. 6|) we find that 


Na\n) = a) aa\n) = {aa) a — a)\n) = a{N — l)|n) = (n — l)a|n) 


(17.9) 


so the state a|n) has eigenvalue one lower than the state | n) and similarly lV|a^n) = {n + l)|a^n). 
Now a applied to the lowest energy state |n m i n ) must vanish, as must ada|n m i n ) and therefore the 
eigenvalue of the lowest energy state (n m i n |aJa|?r m i n ) must vanish n m j n = 0 , and the eigenvalues are 
just the integers n = 0,1,2... The operator N is called the number operator. 

What about the normalization of the states |an)? These are not normalized, since the bra 
corresponding to the ket | an) is {an\ = (n|ad and so {an\an) = {n\a)a\n) = n. If we require that 
these eigenstates be normalized so {n\n) = 1 for all n, then (an|an) = n = n{n — 1| n — 1) and 
similarly {a'n\a'n) = n + 1 = (n + l)(n + l|?r + 1 ) and therefore 


a\n) = v}/ 2 \n — 1 ) 
a^\n) = (n + l) 1//2 |n + 1 ). 


(17.10) 


We can now work out the energy eigenvalues E n . Writing out all the steps in gory detail, these 


are 


2 ?n = (n|if|n) = ^-{n\aa) + a)a\n) 

= [Vn+ l(n\a\n + 1 ) + ^/n{n\a^\n - 1 )] 
= yf\( n + l)(n| n) + n{n\n)} 

= (n + l/2)Tioj. 


(17.11) 


Finally, note that one can express the physical operators p , x in terms of the creation and 
destruction operators as 

x = yjTi/2ijjm{a) + a) 
p = i^/fnjjm/2{a) — a) 


(17.12) 


These expressions allow one to compute expectation values such as the mean square displacement 
(n|x 2 |n) in some state | n) using operator algebra, without requiring any definite prescription for the 
wave-function of the state. More important for us, physical operators appearing in the interaction 
Hamiltonian can be expressed in terms of the creation and destruction operators, which is very useful 
if we want to compute the rate at which interactions inject energy or scatter excitations. 
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17.2 The Interaction Picture 

Quantum mechanics was initially developed in two quite independent ways by Schroedinger and 
by Heisenberg. These two approaches give identical results for the expectation values of physical 
operators (ip\Y but this still leaves considerable freedom in the definition of the states and 
operators. 

In the Schroedinger picture the operators (such as x, p for the simple harmonic oscillator) are 
considered fixed in time while the state \ips) with wave function evolves in time according to 

the time dependent Schroedinger equation 

=-H'lV’s) (17.13) 

the formal solution to which is 


I Mt)) = e-W-W^sih)). 


(17.14) 


In the Heisenberg picture the states | ip) are considered fixed in time, so we can take \i/>h) = 
IV’s(^o)) say, an d the equality (ips\Ys\ips) = {^hIYhI^h) implies that the operators in the two 
pictures are related by 

Y h = e iH(t-t 0 )/n Yse ^H(t-t 0 )/h (17.15) 

and differentiating this with respect to t (with Yg constant) gives the ‘equation of motion’ for 
operators in the Heisenberg picture 


ih 


dY H 

dt 


[Y h , H}. 


(17.16) 


In what follows we shall be interested in systems in which the total Hamiltonian is the sum of 
the free field Hamiltonian Hq, whose eigenstates have fixed occupation numbers, and an ‘interaction 
term’: 

H = H 0 + H int . (17.17) 

In this context it proves useful to work in the interaction picture , which is a hybrid of the Heisenberg 
and Schroedinger pictures in which the operators evolve in time as 


Yj = e iH o(t~to)/tiy e -iH 0 (t-t 0 )/h 


(17.18) 


i.e. like the Heisenberg picture operators of the free field theory, and the equation of motion for the 
operators is 

ih d ^- = [Y I ,H 0 l (17.19) 


while the states evolve as 


ih—\i/)i} = Hint | Vr) * 


(17.20) 


For e xamp le, for the simple harmonic oscillator Ho = ?iu;{acd}/2, and, using the commutation 
relation (17.61, the equations of motion for a, a' become 


da(t)/dt = —iu>a(t) 
da\t)/dt = iioa)\t) 


(17.21) 


the solutions to which are 


i{t) = 


t (t) = nle iut 


(17.22) 


where a, are the initial values of the operators. Note that the commutation relation (17.61 applies 
to a(t), a,t(t) also: [a(f), a^(t)] = [a, a*] = 1. 

In the interaction picture, the operators evolve rapidly, on timescale r ~ Ti/Hq while the states 
evolve slowly on timescale r ~ h/H lnt . 
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17.2.1 The 5-Matrix Expansion 


If we start at t\ with the system in some state |i) then on integrating (17.201 we have for the state 
at some later time t 


\m) = \ + ^ J dt'H int (t) m 1 )). 


(17.23) 


This is an implicit equation for \ip(t)). We can obtain an explicit solution as a power law expansion 
in the strength of the interaction as follows. To zeroth order in the interaction nothing happens; 
the zeroth order solution is (^^(t)) = |i). We can obtain a better approximation by inserting 
= |*) on the RHS of (17.23). We can then put this improved solution in the RHS and so on. 
Successive approximations are 


IV' ( 0 ) W) = I*) 
l^ (1) W) = N> + hi dt> 


ltf (2) W> = \i) + h! dt' + (i) 2 / dt' H lnt {!') J dt" H int (t")\i) 

t1 t i t\ 


(17.24) 


The development of the various terms in this expansion can be expressed as a recursion relation 

t 

\^ n) (t)) = \i) + 1 J dt' HUm^it')). (17.25) 

This expansion is the basis for essentially all calculations in perturbative field theory. It is known 
as the ‘^-matrix expansion’. It gives the time evolution of the states as | ip(t)) = U\i) — where U 
is the time-evolution operator — from which we can compute the amplitude for the system to be in 
some state |/) as (f\U\i). 


17.2.2 Example: A Forced Oscillator 

As an illustration, let us add a term H int = F(t)x to the simple harmonic oscillator. This adds a term 
L; nt = —Hint to the Lagrangian, and the equation of motion is then rnx = —dL/dx = — kx + F(t), 
so F(t) represents an external force which we treat classically. The operator corresponding to this 
classical interaction is 

Hint (t) = + a(t)) (17.26) 

and the state will evolve to 

t 

I m) = mto)) - [ dt F(t) [e-V + e~ iuJt a] |^(i 0 ))- (17-27) 

V Ziomri J 
to 


If we assume the state is initially in the vacuum, so |^(^o)) = |0) then the destruction operator has 
no effect, since a|0) = 0, but a^|0) = |1) so 


IV’W) 


|o> 


iJdtF(t)e^ 
V2 mtoh 


(17.28) 


and the amplitude for the system to be in the excited state |1) at time tf given that it was in the 
ground state |0) at time L is 


< 1 W*)> 


x/2 mu/h . 


dt F(t)e* 


(17.29) 
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which is proportional to the transform of the force at the oscillator frequency. 

Squaring the amplitude gives the probability for the transition as 

b 11 

P( 0 - 1) = |(l,t / |0,t i )| 2 = dt j dt' F(t)F(t')e iu ^ t ~ t '\ (17.30) 

U ti 


For example, if the force is some random function of time, and acts for total time T, this is 


p{ 0 -1) 


T 

2mujTi 


dr 


where 


T 

£f(t) = \ j dtF(t)F(t + T) 
o 


is the two-point function for the force. Invoking the Wiener-Khinchin theorem gives 


(17.31) 


(17.32) 


p {o - 1) 


TP f {u) 

2mu>Ti 


(17.33) 


so the probability that the system gets excited is proportional to the duration that the perturbation 
is switched on and to the power in the force at the frequency of the oscillator, an eminently reasonable 
result. 


17.3 Free Fields 

We can apply these concepts to the non-interacting field theory, since, as we have seen, this is a 
set of non-interacting simple harmonic oscillators. Thus we should be able to construct a pair of 
creation and destruction operators for each mode of the field. We will first do this in detail for the 
case of the 1-dimensional discrete lattice. The transition to a 3-dimensional continuous system is 
then straightforward. 


17.3.1 Discrete 1-Dimensional Lattice Model 

Let us assume a discrete lattice model consisting of a ring of N coupled oscillators. It turns out that 
one can write the displacement operator (f>j(t) as a sum over normal modes: 


Mt) = E ]/ 2 MNu k (4 ei( “ fet_WJV) + . (17.34) 


where a*,, a\ are operators which respectively destroy and create excitations in the mode with wave 
number k. 

To justify this we need to show first that these operators have the appropriate commutation 
relations, and second that they have the appropriate relationship to the Hamiltonian. 

To do this we shall need the analogous expression for the velocities (pj, which are just the time 
derivatives of the displacements (pj given by (17.341: 

h (t) = i E (4e^- 2 ^) - afc e-i(^-2^/A0) . (17.35) 


Next, we relate au and at to the (spatial) discrete transforms of (pj and (pj defined as 

Mt) = E Mt)e i2 * jk/N and $ fc (i) = ^ j>j(t)e i2njk/N . 


(17.36) 
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With a/c(t) = a^e luJkt and a\(t) = a\e lUkt , the first of these is 


Mt) = E E (4,(t)e i2 ^ (fe - fc ' )/JV) + a fc ,(i)e i2 ^ (fc+fc ' )/JV) ) (17.37) 

j k' 

and similarly for 4>fc, but Y e l2 ^A k±k )/ N ) — NSk±k 1 and so 


= \J 2®4(4W + a-fc(t)) 

= \ItmTJZ (4 W — a -k(t)). 


Solving for 4W and ak(t), we have 


4(*) = \J + ^fcW/^fc) 

ak(t) = ($_*(i) - £_*(*)/«.;*) 


(17.38) 


(17.39) 


We are now ready to compute the commutator [a*,, 4']- To obtain this we first need the commu¬ 
tators for the d>fc, operators. Since 4>fc contains only displacement operators (f>j it commutes with 
itself and similarly for 4>k which contains only momentum operators. The only non-zero contributions 
to the commutator come from terms like which is 




E< 


a i2nkl/N 




— i27rk'm/N 


= EE 


0 i2Tv(kl—k'm)/N 


[0/ 7 


(17.40) 


l m 


but <fim = Pm/M so [</>;, </> m ] = p m \/M = ih5i m /M (the Si m here expressing the fact that the 
operators for displacement and momentum at different sites on the lattice commute), and therefore 


!♦„*-*) - »£,»**■*** - 


M 


kk’ 


(17.41) 


The commutator of the a, a' operators is, from (17.391 


[ak(t),al(t)] = 


$-*(*)- t-^,$*(i) 


lUJ k 


®k(t) 

iuk 


or, using (17.411, 


(17.42) 

(17.43) 


[a fc (f),4'W] = fak'- 

For k = k' this commutator is identical to the commutator [a, a'] for a single oscillator and the 
commutator for different modes k ^ k' vanishes. 

We can also express the Hamiltonian in terms of a*,, a\ as follows. If we go back to the expression 
for the Lagrangian (16.11, convert this to the Hamiltonian by changing the sign of the potential energy 
terms, and substitute 4>j —> TV -1 Yk 4>fce _2 ’ I ' I:,/c / Ar we find that the classical Hamiltonian becomes 


H = R77 E **** + = RY E ^-k + 


k k 

and replacing the classical variables by operators we find 




(17.44) 


(17.45) 


which, as expected, is just a sum over the modes of the Hamiltonian for each oscillator mode taken 
separately. 
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From here one can argue exactly as before that the operator ak has the effect of reducing the 
energy in mode k by one quantum and aj, increases it by the same amount. Applying the creation 
operators to the vacuum generates the multi-particle eigenstates (15.11 etc. 

This much we might have reasonably anticipated from the results for a single oscillator. The 
reason for going through the laborious analysis above is to obtain (17.341 and (17.35) for the dis¬ 
placement operator and its conjugate momentum in terms of the ladder operators. This means that 
if we can express the interaction Hamiltonian H ln t in terms of the displacement, or the velocity, then 
we can also express it in terms of the ak. 4 operators. We can then calculate transition rates with 
relative ease. 


17.3.2 Continuous 3-Dimensional Field 

The transition from the discrete model to a continuous field is mathematically straightforward. In 
one dimension we simply replace the position index j by x = jAx and the wave-number index k by 
the physical wave number k p h ys = 2irk/L 1 so the factor 2njk/N = k p j lys :r and similarly in higher 
dimensions 27rj • k /N —> k p h ys • x. The factor MN = M to t is just the total mass of all the beads and 
the (now continuous) displacement and velocity fields become 


*(*>*) = E (4e i( “ k ‘- kx) + a k e-^ kt ' kx >) 

0(x, t) = Jmci (4 e * ( “ kt_kx) - a k e- i(aJk4 - kx) ) 

k V \ / 


(17.46) 


Note that we are working here in a box of size L so that we still have a discrete set of modes. Since 
L —> oo we could replace the discrete sums here by continuous integrals, but the discrete form proves 
to be more convenient for our purposes. 

The operators appearing here still satisfy the commutation relation 

[a k ,4'] = <W- (17.47) 


and the Hamiltonian can be written as 


H = - /iw k (a k «k + 4 fl k) 


(17.48) 


exactly as before. These operators can be used to create and destroy quanta, and to construct the 
energy eigenstates of the system systematically. The energy eigenstates of the free-field are simply 


the multi-particle eigenstates (15.11 which are the product of the states for the individual modes. 

One might worry whether we are really allowed to carry over the commutation relations [<j>i , p m ] = 
ihSi m for the discrete position and momentum operators; here this says that the field and momen¬ 
tum operators at two different points commute, regardless of how close they are. Ultimately, the 
justification for this physical assumption is the extent to which the resulting field theory adequately 
describes nature. 


17.4 Interactions 

Consider a 2- or 3-dimensional discrete lattice model. Imagine the system is initially in some eigen¬ 
state of the free field Hamiltonian and we then ‘switch on’ some small additional interaction term 
(which will be specified presently) for some period of time. The interaction will cause the state to 
deviate from the fixed occupation number state, and we will have generally non-zero amplitudes, 
and hence probabilities, for the system to be found with different occupation numbers after the 
interaction. 

We will illustrate this with several examples which we work through in detail. We will compute 
the scattering of phonons off an impurity in the lattice and then scattering of phonons with each 
other via non-harmonic (i.e. non-quadratic) behavior of the spring potential energy. Both of these 
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phenomena can be described using first order perturbation theory. We then explore the scattering 
of phonons via exchange of a virtual particle; an example of a second order perturbation theory 
calculation. 

17.4.1 Scattering off an Impurity 

What happens if we introduce an ‘impurity’ in the lattice and make one of the beads, the one at the 
origin of coordinates for concreteness, abnormally heavy with mass M + AM. This will introduce a 
perturbation or interaction term to the Hamiltonian so we can write 


H = H 0 + H mt 


with H 0 the free-held Hamiltonian and with 


Hmt(i) = ^AM0(O,f ) 2 


(17.49) 


(17.50) 


the interaction term. 

Classically, we expect that if we were to ‘illuminate’ such an impurity with a monochromatic, 
beamed sound wave then this would give rise to a spherical outgoing scattered wave of the same 
frequency, quite analogous to Thomson scattering of an electromagnetic wave by an electron. 

To treat this quantum mechanically, we first write <f>(0,t) in H- ln t(t) as a sum over creation and 
destruction operators: 

0(0, t) = Yl “ a kW) (17.51) 

k 

and so the interaction term can be written as a double sum 

\Aj k Wk'(ak(t) - a k (i))(aj c ,(i) - a k '(t)) (17.52) 


which contains all possible pairs of Ok’s and a^’s. 

Applying ( 17.25|) to some initial state |z) gives the time evolution of the state 


= U(t,ti)\i) = |i) + jn f dt H int (t)\i) = I*) + iHjfr E E \Mc^k' 

k k ; 

t 

x f dt (4e“ kt - a k e _iwk *)(o^, e iUk ' 4 - 

t-1 

(17.53) 

Let’s suppose the system starts out in some multi-particle eigenstate of the unperturbed Hamil¬ 
tonian |z) = | ..., n k,...). After applying the perturbation for some time T the state will no longer 
be in this pure eigenstate, but will contain a superposition of the original state and states obtained 
from the initial state by applying a pair of creation and/or destruction operators. We can ask for the 
amplitude that the system be in some specific eigenstate |/) = |... , ...) with a different set of 

occupation numbers. For example, let’s ask for the amplitude (f\U\i) that the occupation numbers 
differ only for two modes ki, k 2 and where = rii — 1 and n' 2 = ri 2 + 1 i.e. an excitation has been 
annihilated from mode kx and one has been created in mode k 2 . With this choice of initial and final 
states, the only terms in the double sum over operators which produce a non-zero contribution to 
the amplitude are those containing the destruction operator for the mode ki and a creation operator 
for mode k 2 , for which we have 


(/l°k 2 °k 1 |i) = Vmi 1 + n 2 ), (17.54) 

and we get an identical result if we change the order of these operators (since this is a scattering we 
can assume ki yf k 2 , so the operators aj^ and a kl commute). Any other combination of operators 
will produce a mixing into a state which is orthogonal to |/) and therefore gives zero when sandwiched 
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between (/| and |i). Taking account of both possibilities (i.e. k, k' = k|. k 2 and k, k' = k 2 ,k 1 ) give 
the transition amplitude 

(f\U\i) = rti(n 2 + 1 ) J dt e l ^ 2 ~ Ux)t . (17.55) 

For large T, the integral becomes 2i tS(u >2 — u> i). Had we asked, instead, for the amplitude to make 
the transition to the state \n\ + 1, n 2 + 1), we would have had 2i r<5(w 2 + u>\), which vanishes because 
the energies are positive. 

The probability to make this transition in time T is the squared modulus of the amplitude: 

p(ki -> k 2 ) = ^ wiw 2 ni(n 2 + 1) J dt j dt' . (17.56) 


For large T, the double integral becomes f dt J dr e^ Wl U2 ' >T = 2 itT5(uji — w 2 ) so the transition rate 
(i.e. the transition probability per unit time) is 


i?(ki -> k 2 ) 


lim 

T—► oo 



7rA M 
Mtot 


wiw 2 ni(n 2 + l)5(wi — w 2 ). 


(17.57) 


The ^-function here says that the phonon frequency, and therefore the energy, is unchanged in the 
scattering; this is elastic scattering. More generally, if the perturbation only acts for finite time T, 
the energy conserving ^-function is replaced by a function with width Su> ~ 1/T. Equation (17.571 
gives the rate for a transition to a specific final mode k 2 , and if we integrate over all modes at the 
scattering frequency we can obtain the net scattering rate out of state ki. This is quite analogous 
to Rayleigh scattering. Note that wave-momentum is not conserved here. This is to be expected 
since the presence of the impurity violated the invariance of the system under translations. 

A key feature here is the dependence of the scattering rate on the initial occupation numbers: 


f?(ki —► k 2 ) oc ni(n 2 + 1). 


(17.58) 


This is a result of profound significance. That the rate should scale with the occupation number n\ 
in the initial mode is reasonable, but we see that the rate also depends non-trivially on the initial 
occupation number n 2 of the final mode. This is the phenomenon of stimulated emission as originally 
deduced by Einstein from considering a two-state system in thermal equilibrium with black-body 
radiation. Here we see it arising as a fundamental property of field theory — it will apply to any 
bosonic field and is quite general — and it says that if a certain state has a high occupation number 
then the probability that particles will scatter into that state is increased. 

Note that the wave-momentum is not conserved in scattering off an inpurity, since the Lagrangian 
density is not independent of position. 


17.4.2 Self Interactions 

As discussed in the previous chapter, another interesting model for the interaction of a field is a 
self-interaction. In our ‘scalar-elasticity’ model this can be introduced by letting the springs I\ have 
a slightly ‘non-harmonic’ or non-quadratic behavior, so the contribution to the potential energy (and 
therefore to the Hamiltonian) is 


v{4) 



(17.59) 


for example. 

This type of modification will again break the non-interacting nature of the normal modes. At 
the classical level, the interaction term 


Hint = 


/ 



</> 4 


d 3 x Hint = A 


(17.60) 
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will introduce a coupling between the otherwise independent normal modes. Expanding the field 
as a sum over creation and destruction operators cj k 1 ^ 2 (aj c e I ^ kt_k x - 1 + ake _ ^“ kt_k x ) the 

k 

interaction Hamiltonian will now contain a four-fold sum over wave-vectors with terms consisting of 
all possible combinations of four a’s or al’s: 

+ a k e i *-“ k *” k x 


Hint OC A [ d 3 xY,J2J2J2 

J k k' k" k'" 


(17.61) 


where ... stands for three more copies of the term in parentheses, but with k replaced by k', k" 
and k'". We can use this with the S'-matrix expansion to compute the transition amplitude between 
initial and final states with definite occupation numbers. As with scattering off an impurity, once 
we specify the initial and final states, only a very limited subset of all of the possible combinations 
of operators are relevant. 

For example, consider initial and final states |i) = |?i'i, 77 - 2 , 713 , rc 4 ) and |/) = |ni — 1,712 — l,ri 3 + 
1,714 + 1)- We are only labelling the modes for which the occupation numbers actually change. 
This is a scattering reaction kik 2 —> k 3 k 4 . Obviously, the only effective combinations of operators 
contain destruction operators for modes k 1: k 2 and creation operators for k 3 , k 4 for which 

(/I a k 4 a k3 ak 2 ak il*) = \Aii«2(l + 77.3) (1 + n 4 ). ( 17 . 62 ) 

The operators can appear in any order — since we are assuming that all of the modes are distinct 
and this yields 4! identical contributions for the 4! ways to assign k, k', k", k'" to ki, k 2 , k 3 and 

k 4 . 

With the operator algebra taken care of we can proceed to performing the spatial and time 
integrals: 

(f\U\i) = (f\±-fdtJd 3 xH int \i) 

OC A / ^ ra2 l Tt3 + 1 H n4 + 1 ) f dt e -( w i+“2+W3+aJ4)t f d 3 X e , (ki+k 2 -k 3 -k 4 ) x (17.63) 

Y LUiLU2^J3<^4 J J 

tl 


The spatial integral is (27r) 3 5^ 3 ^(ki + k 2 — k 3 — k 4 ). This means that the sum of wave vectors of 
the phonons is conserved in the scattering. We will assume that the interaction is on permanently, 
in which case ti —> — too and 1 2 —» oo and the time integral becomes 2ttS(uji + 1 x 2 — W 3 — w 4 ), so the 
sum of the temporal frequencies is also conserved in the interaction. Squaring the amplitude gives 
a transition probability per unit time, or scattering rate , of 


p(ki,k 2 


T 


2A! oc \V‘\k + t 4 ) " in2(1 + " 3)( i + 


n 4 ) 


W 1 W 2 W 3 W 4 


(17.64) 


where we have defined the ‘frequency-wave number 4-vector’ k = (w, k) — though this should not 
be construed as a Lorentz 4-vector — and <5^(k) is shorthand for <5^ 3 ^(k)<5(w). 

This is a scattering of phonons off each other induced by non-linearity in the springs. We use the 
symbol shown in figure 17.1 to denote the contribution to the complex amplitude (17.631. This is a 
single-vertex Feynman diagram with four external legs. A Feynman diagram tells us at a glance the 
initial and final states, and the nature of the interaction term (here the vertex with four emerging 
lines tells us that we are considering an interaction Hamiltonian consisting of the product of four 
fields). It also tells us at what order in the general S-matrix expansion we are working; here there is 
one vertex which tells us this is a first order contribution to the amplitude. It also happens to look 
like a space-time diagram of a collisional interaction between a pair of particles. 

The self-interaction scattering conserves total energy and wave-momentum, as it should since the 
classical Lagrangian density on which the theory is based has no explicit dependence on position. 

Conservation of the sum of the frequencies of the quanta involved is readily interpreted as con¬ 
servation of energy, since the energy of each mode is (n + 1/2)Twj. Conservation of the vector sum 
of the wave-numbers similarly implies conservation of the total wave momentum. In what follows 
we will sometimes refer to ‘energy u or ‘momentum k’. In such blatantly dimensionally incorrect 
statements we are implicitly setting h = 1 . 
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Figure 17.1: Feynman diagram for first order (i.e. single-vertex) scattering of phonons induced by 
an an-harmonic spring potential energy V(<p) = Kcj) 2 / 2 + A</> 4 . 



Figure 17.2: Feynman diagrams for the decay of a chion (dotted line) into a pair of phions (solid 
lines), and the inverse process. The interaction Hamiltonian is H ln t = acjrx- 


17.4.3 Second Order Scattering 

The examples above were processes that can be described at first order in perturbation theory. 
Many important scattering processes involve the exchange of a virtual particle, and the rates for 
these processes require second order perturbation theory. With a slight modification to our ‘scalar- 
elasticity’ model we can explore such processes. 

To this end, consider a medium that can support two types of phonon fields of the type we have 
been discussing : <f> and %, each with their own free-held Lagrangian density, though with different 
parameters /i = ^t x , Let the fields be coupled by a term 

7-t in t = atfx (17.65) 


where a is a coupling constant. It should be clear that this has the potential to describe first order 
processes such as a ‘chion’ decaying to a pair of ‘phions’, or a pair of phions annihilating to form 
a chion as shown in figure 17.2 If we denote the frequency of the x field by fi(q), with q the 
spatial frequency, then the rate for such a process would involve an energy conserving ^-function 
6(a;(ki) + w(k 2 ) — fi(q)) and the momentum conserving factor (ki + k 2 — q). 

Now what if the minimum energy (i.e. Ti times the minimum frequency) of our chions is greater 
than the sum of the energy of the available phions? This is easy to arrange; recall that the minimum 
energy is set by the strength of the Jv-springs (or by the —cx 2 /2 term in the continuous field 
Lagrangian density). In that case the first order transition </> + <j> \ cannot take place. There 
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is, however, still the possibility of interesting second-order effects such as phion-phion scattering 
mediated by the exchange of a virtual chion. Calculating the evolution of the initial state in the 


interaction picture at second order using (17.241 gives 


| <t>{t f )) = U\i) = | i) + ^ J dtH int (t) J dt'H int (t')\i) 


(17.66) 


where we have discarded the first order term. 

In terms of the interaction Hamiltonian density these time integrals become integrals over space- 
time 


U\i) = 


^ dt 


d 3 a 


7i(x,t) / dt' / d 3 x' 7d(x', t')\i 


(17.67) 


or 


Z 

I<£(*/)) = |*) + (Jr) J dt J d 3 x (#x)x,t J dt 1 J d 3 x' (#x) x ',t'|*) (17.68) 

As usual, we now expand each of the field operators here as a sum of creation and destruction 
operators: 

0(x, t) = J2 w(k) _1/2 (a| c e i( " kt_k ' x) + a k e- i( “ kt - k x) ) (17.69) 

k 

and 

X(x, t)=J2 tt(qr 1/2 Ke i(n<lt ~ q ' x) + a q e- l(n ^- q ' x) ) (17.70) 

q 

where, for clarity, we have dropped the constant factor \j2Ti/M tot . Since there are six fields involved 
this yields a six-fold sum over wave-vectors, each element of which consists of various combinations 
of as and a^s. 

Now let us specify the initial state as |i) = |ki,k2;0), by which we mean one phion with mo¬ 
mentum ki, and one with k 2 and no chions, and the final state as |/) = |k 3 ,k 4 ; 0). That is, we are 
considering the scattering process k.ik .2 —> k 3 k 4 . The amplitude to make the transition (f\U\i) then 
picks out a very limited subset of all the possible second order terms in our 6-fold sum. First, any 
relevant term must contain a pair of destruction operators a kl , a k2 to annihilate the initial particles, 
and a pair of creation operators a k , a k4 to create the outgoing particles. Second, it must also 
contain a creation operator a ^ to create a chion, with some as yet unspecified momentum q, and 
must then also contain a destruction operator a q for the same momentum. This eliminates five of 
the sums over momenta and we are left with a single sum over the momentum q of the virtual chion. 
One such combination of operators which gives a non-zero contribution to the amplitude (f\U\i) is 


(a k 


O.. a q ) (a kl a ka a, 


2 u k 


■q)‘ 


(17.71) 


where the first triplet is associated with the space-time point (x, t) and the second triplet with 
(x',t'). This would describe the scattering of phonons via the exchange of a chion, the Feynman 
diagram for which is shown in figure |17.3| 

The contribution to the transition amplitude from this term is then 


(f\m 


" « 2 E <k3 ’ k4;0|a Xv°i^b a "' kl ’ k2;0> / dt I d3x J dt ' f d3x ' 


(17.72) 


Performing the spatial integrations results in a pair of 5-functions, and the expectation value is 
(k 3 , k 4 ; 0|... |kj_, k 2 ; 0) = 1 which takes care of all the operator arithmetic. Changing the integration 
variable from t' to t = t' — t, the amplitude becomes 

(f\U\i) ~ a 2 (wiw 2 W3W4) _1/2 E ^ q ^ (3) ( k 2 - k 4 + q)5 (3) (ki - k 3 - q) 


f dt e - i (Ul+U 2 -U 3 -U) i )t J d T e *( W 3-“l+^q)T 


(17.73) 
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time 



space 

Figure 17.3: Feynman diagram for second order scattering of a phonon. The total Lagrangian density 
here is assumed to be that for two free fields (j> (solid lines) and \ (dashed) with an interaction term 
Ant = a( t> 2 X- 


The two three dimensional 5-functions express conservation of 3-momentum at each of the ver¬ 
tices. We can use one of these to perform the summation over q, since <5^ 3 ^(k x — k 3 — q)/(q) = 
/( ki — k 3 ), i.e. we erase this (5-function, and the summation, and replace q by ki — k 3 throughout. 
The remaining 5-function becomes 5^(ki + k 3 — k 3 — k 3 ) which conserves total wave-momentum. 
The dt integral similarly enforces conservation of total energy. All that remains is the dr integral. 
Introducing an infinitesimal positive constant S to ensure convergence, and defining e = ui\ — w 3 , 
this is 

o o 

[ dre- i(£ -^ )T = lim [ dr e~ i(g ~ n< » +< *> T = . (17.74) 

J < 5-0 J e-n q 

— OO —OO 

Combining the 5-functions the amplitude is then 


/f|r ^ ^ n2 S( 4 \k 1 + k 2 -k 3 -k 4 ) 1 _ 1 _ 

y / tt’lW2W 3 W4 Oq 6 — flq 


(17.75) 


The final amplitude is then a rather simple function of the external phonon momenta and energies 
(since e = — w 3 and q = k 3 — k 3 ). However, what we have computed here is only part of the 

story; we have only considered the one possible combination of operators in (17.711. We need now 
to enumerate all of the allowed terms, and sum their contributions to the amplitude. 

How many combinations like (17.711 are there that give a non-zero contribution? Let’s call the 
vertex at (t',x r ) vertex A and that at (i,x) vertex B. The ^-matrix expansion stipulates that t' 
precede t. Vertex A must then contain the chion creation operator and vertex B must contain the 
corresponding chion destruction operator. The operators for the external phions can, however, be 
assigned in any way we like, subject only to the condition that we have two phion operators per 
vertex; this being dictated by the form of the interaction ([Tf ,65[). The number of ways of connecting 


17.4 


the 4 external particles in pairs to two vertices is 4!/(2!) 2 = 6. These are shown in figure 
Evidently there are three pairs of diagrams which differ only in that in the lower member of each 
pair the right vertex has been ‘pulled back’ to precede the left vertex, and the vertex labels have 
been swapped. 

Consider the center pair of diagrams. We have already calculated the contribution to the am¬ 
plitude for the upper diagram. The lower diagram gives a very similar contribution. If we swap 
primed by unprimed coordinates, all of the complex exponential factors associated with the external 
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3 4 





3 4 



Figure 17.4: The six possible contributions to phonon-phonon scattering via a acjxpx interaction 
potential. These are ‘time-ordered’ processes; in each case the vertex A precedes B. 


particles are identical to what we had before. The only changes are that we must now integrate over 
t < t' (or equivalently t' > t ) and that we must replace the chion creation operator associated with 
the vertex at yd ,t' with a destruction operator and vice versa. This has the effect of changing the 
sign in the exponent for factors like e lQt and e* q x . 

The contribution to the transition amplitude from both of these diagrams is 


UJ i Co>2 ^3 ^4 


dtf d 3 x f d 3 x'e i(k 2 —k 4 +q).x e i(ki—k 3 —q)-x' e i((<Ji+W 2 —W 3 -<^ 4 )t 


Xj|- < f dr e n ^ T + J dr e *( e + a q)' r _ 


(17.76) 


Where, as before, e = u>\ — W 3 ; i.e. it is the amount of energy being transferred from the left-hand 
phion to the one on the right. It might appear that we should have included the e lqx etc. factors 
within the term in braces and with appropriate sign factors for the two cases t' < t and t’ > t. 
However, this is not necessary since we are integrating over all possible q, so we can replace, for 
instance X) q e iqx by X^ q e_ * q x - 

The dr integrals can be performed as above, to obtain 


1 

n q 



dr 


dr 


0 -i(e+fiq)T 


-1 -1 _ 2 
e — flq e + H q — e 2 


(17.77) 


As before, the dt, d 3 x and d 3 x' integrals and the sum over q become a 4-dimensional (5-function 
conserving energy and total 3-momentum for the diagram as a whole. 

The final amplitude for this pair of diagrams taken together is then 


(W\i) 


2 ( 5 (4 ) (fci + h 2 — k 3 — £4) 1 

y/LOiU)2i03L04 H 2 - € 2 ’ 


(17.78) 


We symbolize this contribution by the center diagram in |17.5| The two vertices are drawn here at 
the same time to emphasize that this diagram accounts for both of the two cases where the left 
vertex precedes the right and vice versa. There are three such diagrams, as depicted in figure [T73 
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3 4 




3 4 



Figure 17.5: Diagrams for phonon-phonon scattering, each of which combines the contribution to 
the amplitude from the pair of time-ordered graphs in figure |17.4| The diagram on the left, for 
example, symbolizes the contribution from two separate time-ordered processes. In one, the pair of 
phions 1, 2 annihilate to create a chion which later decays into the phions 3, 4. In the other, the 
phions 3, 4 and the chion (which must have negative energy) are first created, and the chion later 
merges with the phions 1 and 2. 



Figure 17.6: An alternative view of the process shown in the left panel of figure [173} 


Each contains a factor l/(fip — e 2 ) where p is the momentum transferred in the exchange and e is 
the energy transferred. For the left hand diagram, for example, e = u>\ + W 2 , and q = ki — k 2 while 
for the right diagram e = u)\ — uq, and q = ki — kj. 

Each diagram is topologically distinct from the others. Imagine the external lines as rubber 
bands connected to the corners of a frame, and the internal line as a rigid rod. We can hold the rod 
in any orientation, and look at it from any angle. This generates diagrams which look superficially 
different, but we do not add any extra contribution to the amplitude. For example, we could have 

as in 


drawn the diagram on the left in figure 17.5 


17.6 but this is topologically equivalent. 


That’s it — we’re done. Squaring the amplitude, summed over all topologically distinct diagrams, 
gives the transition probability, and thereby the scattering rate for this process. 

The final amplitude is very similar to that for 1st order scattering via a A (f> 4 interaction (17.631. 


In both cases we have conservation of total momentum and energy, and in both cases there are 
factors associated with each incoming and outgoing particle. Had we considered here initial 

states with occupation number rii > 1 or final states with initial occupation number 71 / > 0 then we 
would have also obtained an extra factor \J 711772(1 + ri 3 )(l + 714 ) exactly as in (17.63). 


An important new ingredient is the factor 




(17.79) 


which we will call the chion propagator. It is a function of the incoming and outgoing particle energies 
and momenta. Now the function fl(q) is just the chion dispersion relation; it gives the angular 
frequency for a real chions of momentum p = 7?q. The relation between energy and momentum for 
real phions is then 


E(p) = Sil( p/fi). 


(17.80) 















226 


CHAPTER 17. Q UANTUM FIELDS 



Figure 17.7: Contour integral for evaluating the chion propoagator. 


For the free-held Lagrangian we have been considering the dispersion relation is afl 2 (q) = &|q| 2 + c, 
so the real particle states lie on a 3-dimensional hyper-surface in energy-momentum 4-space; the 
‘energy shell’. This is exactly analogous to relativistic particles which live on the ‘mass-shell’ E 2 = 
p 2 c 2 + m 2 c 4 . 


Now consider the left hand phion leg of figure |17.3| This particle comes in with energy Tilui 
and leaves with energy so it transfers to the chion energy A E = h(ut i — W3). Similarly, it 
transfers an amount of momentum Ap = 7i(ki — k^). We have seen that 3-momentum is exactly 
conserved at each vertex. We can also say that energy is also conserved, since whatever energy is 
lost (or gained) in the interaction by the left-hand phion is gained (lost) by the right-hand phion. 
However, for the situation we are considering here, there is insufficient energy to create a real chion 
with momentum Ap; i.e. A E < fiLl(Ap)\ the energy A E transferred by the chion exchange is not 
the same as that obtained from the dispersion relation (17.801. This is what we mean when we say 
that the exchanged chion is a virtual particle. The energy-momentum 4-vector lies off the energy 
shell. (In the relativistic context, we say that the exchanged particle is ‘off mass-shell’). Note that 
the strength of the interaction becomes larger the closer the exchanged energy is to that for a real 
chion with momentum Ap. If we increase the energy and momentum of the incoming particles we 
will eventually reach energies where, if the outgoing momentum is sufficiently small, one can create 
a real chion, and the interaction strength then formally diverges. This is not unphysical, however, 
since in that case we should not have ignored the first order scattering amplitude. 


Note also that for phion energies much less than hLl m - ln the phonon propagator is effectively 
independent of the external particle energies. The reaction rate is then the same that one would 
obtain from 1st order scattering with a A<(> 4 interaction term with a suitable choice of coupling 
constant A. However, with increasing reactant energy one would see an increase in the reaction 
rate which would signify that one is really dealing with collisions which are being mediated by the 
exchanged of virtual particles for which the minimum (real particle) energy is large. 


17.4.4 Contour Integral Formalism 


An alternative and elegant way to evaluate the transition amplitude is via contour integration. The 
second line of (17.761 can be written as 


/ 


dr 



e s(f2—e)r 

n 2 - (o q - is ) 2 


(17.81) 


Here LI is a complex variable and the integral is to be taken along the real axis, as indicated in figure 


positive half of the complex Ll plane and we pick up the residue at Ll = — (fl q — iS) and for r < 0 
we can complete the integral in the lower half of the plane and enclose the other pole. 


17.7 The integrand has poles at LI = ±(fi q — id). For r > 0 we can complete the contour in the 













17.4. INTERACTIONS 


227 


The amplitude can then be written as 

x / dtj d 3 xf dt'f d 3 x' e i(k 2 -k 4 -q). Xe i(k 1 -k 3 +c l )- x , e i(L, 4 -v 2 -n)t e i( U3 -v 1 +n)t' (17.82) 

= “ & ~ d) 6{4) $i ~k 3 + q) 

This is an integral over energy-momentum space and the other integrals over the vertex coordinates 
are now taken without let or hindrance over the entire space-time. 

This form for the amplitude is very convenient, particularly for computing more complicated 
graphs. If there are multiple internal lines then we have an integration over all energy and momentum 
values for each internal line. 


17.4.5 Feynman Rules 

Feynman realised that there was a rather mechanical procedure for generating the amplitude for a 
diagram. For our simple a<j> 2 x model these are: 

1. Write a factor for each external leg. 

2. Label all lines with momenta, and pencil in the assumed direction of momentum flow. 

3. Write a factor 1 /(ST 2 — fi^) for each internal line. 

4. Write a total energy-momentum conserving 5-function. 

5. Integrate over all energy-momenta that are not determined by the external particle momenta. 

17.4.6 Kinematics of Scattering 

Conservation of total energy and momentum imposed important kinematic constraints on the allowed 
interactions. For example, one immediate consequence is that the amplitude for processes where one 
particle decays into several lower energy particles of the same kind is identically zero. To see this, 
one can invoke the invariance under Lorentz-like transformations to transfrom into the frame where 
the momentum of the initial particle vanishes. Since the outgoing particles is necessarily greater 
than their rest-energies, such a process cannot conserve energy. 


17.4.7 Discussion 


The treatment here has been incomplete and somewhat superficial; we have ignored fermions entirely 
and we have considered only a rather simple bosonic field theory, a rather idealized ‘scalar-elasticity’ 
model. The point here is not to learn about real solids though, but to illustrate the way in which 
quantum field theories are constructed and how transition rates are computed. What we have 
sketched here is so called ‘second-quantization’, which is rather different in flavor from elementary 
quantum mechanics where one solves the Schroedinger equation for a wave-function. The program, 
in outline, is to take some classical system (defined by its wave equation, or equivalently by its 
Lagrangian density) as an idealized non-interacting system, find the normal modes and construct 
the creation and destruction operators which act on the multi-particle states (15.11. One then adds 


interactions, usually in the form of assumed weak couplings between different fields, express the 
interaction Hamiltonian in terms of the a’s and cd’s and compute rates for transitions using the 
^-matrix expansion. This proves to be very powerful. 
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17.5 Problems 


17.5.1 Ladder Operators 


a) Compute the mean square displacement (n|a: 2 |n) for a simple harmonic oscillator H = p 2 /2m + 
w 2 a: 2 /2 using the creation and destruction operators a), a. Note that the energy eigenstates \n) are 
orthonormal: (m\n) = S mn . 

b) Show that the Heisenberg equations of motion for the creation and destruction operators a ', 


a are 


da(t)/dt = —iua(t) 
do) ( t)/dt = iua t (f) 


(17.83) 



Chapter 18 

Relativistic Field Theory 


18.1 The Klein-Gordon Field 


The system we have been discussing is a mechanical one, with waves propagating on an underly¬ 
ing medium consisting of real rods, springs, beads etc. However, abstracting away the underlying 
medium and choosing suitable coefficients in the Lagrangian density it becomes immediately trans¬ 
formed into something much more interesting and exotic; a 3-dimensional relativistically covariant 
scalar field. 

• The scalar field is denoted by (f>{x ) = 0(x,f), and is a Lorentz invariant quantity. Different 
observers assign different values to the coordinates (x,t) of space-time events but they agree 
on the value of the field <f>. 

• The Lagrangian density for the scalar field is 

£ = - ?>( v< ^) 2 - d 8 - 1 ) 

• This is in natural units such that c = K = 1. In physical units 

£ = 2 ^ 2 - 2 ^ 2 - 2 ^ 2 - < 18 - 2 > 


• The Lagrangian density has precisely the same form as the scalar elasticity model. However, 
in that model, the field <f> had dimensions of length. Here, since the Lagrangian density has 
units of energy density C = ML~ 1 T~ 2 the field has units <f> = M 1 / 2 L 1 / 2 T _1 . 


The Lagrangian density (18.1) can be written 


C = -^(pd^cj) - ^m 2 </> 2 , 


(18.3) 


which is clearly covariant. The Lagrangian density is a Lorentz scalar. 


• The action integral is 


This is also Lorentz invariant. 
• This yields the field equation 


or 



d{dC/dcj) yfl ) 
dx^ 



(/) — V 2 <(> + m 2 0 = 0 


(18.4) 


(18.5) 

(18.6) 
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or again, equivalently, 


Dcp + m 2 (j) = 0 


which is the Klein-Gordon equation for a massive scalar field. 


(18.7) 


• We are working here in ‘natural’ units such that both h and c are unity. If we put these back 
in, the field equation is <p/c 2 — V 2 (p + (m 2 c 2 /U 2 )(p = 0. 

• This is a purely classical field equation analogous to the electromagnetic field equations for the 
field A. Planck’s constant appears only as a parameter in the mass term. 

• The traveling wave solutions of this equation have dispersion relation u> 2 = k 2 + m 2 (or h 2 lo 2 = 
h 2 c 2 k 2 + m 2 c 4 in physical units) which we recognize as the relativistically correct relation 
between energy and momentum for a particle of mass m. The scale £;* in the phonon model 
now becomes the (inverse of) the Compton wavelength for the particle. 


• Quantization of the normal modes of this classical equation yields massive, neutral spin-less 
non-interacting ‘phions’. 




The propagator (17.79) becomes l/(k ■ k — m 2 ) 


which is again covariant. 


• As discussed earlier, a complex field can represent charged particles, though we shall not 
explore that here. 


This is interesting. What started out as a simple physical model with beads on springs has become 
a relativistic massive scalar field. But aside from rescaling of the parameters in the Lagrangian it is 
the same physical system, so we can draw useful analogies between the concrete beads and springs 
system and the more abstract relativistic field. With further slight modifications, our beads and 
springs system can also illustrate some other interesting features of field theory. Let us explore a 
few of these. 

What if the A-spring were made an-harmonic? Specifically, what would happen if we were to 
replace the potential energy term Kcp 2 /2 (which becomes the mass term m 2 (f > 2 /2 in the scalar field 
Lagrangian) by some more general potential 

V{(j>) = ^ m 2 q 5 2 + X(/) 4 (18.8) 


say? As discussed above, this leads to scattering of the quanta which would be described by a single¬ 
vertex Feynman diagram with four legs. An interesting feature of the transition matrix element for 
the relativistically covariant field is that the 1-dimensional energy conserving (5-function and the 
3-dimensional wave-number conserving (5-function in (17.641 combine to form a 4-dimensional 15- 
function which ensures conservation of total 4-momentum in the scattering process. 

What if the potential is of the form V(cp) = constant — cuj) 2 + b<p 4 with a, b positive constants? 
This kind of w-shaped potential (see figure 18. 1| ) leads to spontaneous symmetry breaking where at 
low temperatures the field will want to settle into one of the two minima, and leads to the formation 
of ‘domains’ within which the field is spatially constant, separated by ‘domain-walls’ where there 
is concentration of field gradients (and therefore potential energy). Higher dimensional fields are 
possible, and these lead to other possible topological defects. These have been considered as possible 
mechanism for the formation of cosmological structures. 

What if we have multiple scalar fields? Clearly if we add a new independent field ip to give a 
total Lagrangian density C = C(cp^,(p) + £(ip t/Jj ,ip) this will give decoupled non-interacting quanta. 
However, what if we add an interaction term £ mt = —a<p 2 ip 2 ? Such an interaction would allow 
scattering of phions off psions via a single vertex diagram (i.e. a first order process). Similarly, a 
term a(p 2 ip would allow phion-phion scattering via the exchange of a virtual psion. 

What if the Lagrangian for the second field ip has potential V(ip) = constant — aip 2 + bip 4 and this 
field has settled into one to the asymmetric minima? This means that within any domain ip will be 
constant, and from the point of view of the Afield there appears to be a potential aip 2 (p 2 = M e g(p 2 /2 
so this gives rise to an effective mass term for the phions even if the (p field started out massless. This 
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Figure 18.1: Illustration of highly an-harmonic w-shaped potential that arises in theories with spon¬ 
taneous symmetry breaking. 


may sound baroque, but this is what underlies the Weinberg-Salaam model for weak interactions 
where the ip field is the Higgs field and the cj> field represents the W or Z vector bosons, which 
acquire a mass in just this manner. 


18.2 Quantum Electrodynamics 


Other theories can be constructed by introducing different fields, which need not be scalars, and 
specifying the couplings between them. Once such example is quantum electrodynamics, where the 
coupling is 

Hint °c ipAip (18.9) 

where ip represents the field for an electron, which is a 4-component ‘spinor’ it turns out, and A the 
electromagnetic 4-potential. 

This is different from the scalar field analysis since the electron is a fermion. Fermions obey the 
exclusion principle. This means that the occupation numbers can take only the values 0, and 1, 
and that the creation operator applied to an occupied state must vanish. This is accomplished by 
replacing the commutation relation for bosonic ladder operators by an anti-commutation relation 
with [a, a'} {a, ad}. This has the interesting consequence that the rates for scattering will involve 

factors 1 — rif where n/ represents the initial occupation number of the final state, rather than 1 + n/; 
the transition rate is proportional to the ‘unoccupation number’ of the final state. In many other 
respects, interactions involving fermionic fields are treated much as above. 


The 3-field interaction (18.91 does not admit any real single vertex reactions because 4-momentum 


cannot be conserved. Real reactions appear in second order perturbation theory through terms like 


ip^ A^ ipip^ Aip 


(18.10) 


which (reading right to left as usual) destroys an initial electron and photon, creates a virtual electron 
(which may be off mass-shell) which then gets destroyed, followed by creation of an outgoing electron 
and an outgoing photon. This type of term describes Compton scattering (figure |T8T2 1. Other 
products of factors describe scattering of electrons by exchange of a virtual photon etc. with diagrams 
similar to those in figure |18.2 Because electrons are charged, not all of the processes depicted in 
figure [l7~5l are allowed. For electron-electron scattering, there is no analog of the left-hand diagram 
since two electrons cannot annihilate to make a photon. Such a diagram does, however, contribute 
to electron-positron scattering, but then only one of the other diagrams contributes since an electron 
cannot transmute into a positron. 
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Figure 18.2: Feynman diagram for the lowest order contribution to the amplitude for Compton 
scattering. Solid lines denote electrons and wiggly lines photons. 


18.3 Connection to Kinetic Theory 


The reaction rates obtained as above provide a key input to kinetic theory, which describes the 
evolution of the 6-dimensional phase space density for particles /(r,p). 

The phase space density is very closely related to the occupation number n^. These were derived 
for a unit volume box, for which the separation of states is Sk = 2i t/L. If the density of particles 
is uniform in space, so / is a function only of momentum, the number of particles in 3-dimensional 
volume element of momentum space is N = f(p)L 3 d 3 p, whereas, with p = Ti k, the sum of the 
occupation numbers for the modes in this volume is N = hd 3 p / {2 ttTi / L) 3 where n is the mean 
occupation number, and therefore 

h = fh 3 . (18.11) 

Now the mean number of reactions taking place in volume V and time T in which there are 
initial particles in momentum space elements d 3 p\ and d 3 p 2 and final particles in momentum space 
elements d 3 ps and d 3 p^ is then, for the \<j) A self-interaction model, 


N(p!,p 2 ~>P3iP4:) °c VTX 2 S^\p 1 +p 2 -P 3 -P 4 ) 


x mn 2 (l + n 3 ) (1 + n 4 ) ^ 


(18.12) 


where the energy factors arise from the frequency factors in the denominator in (17.64). This is 


already manifestly covariant; the factor VT is Lorentz invariant, the occupation number is propor¬ 
tional to the phase space density which is also Lorentz invariant, and each of the momentum-space 
volumes is paired with the corresponding energy in the combination d 3 p/E which is also Lorentz 
invariant. 

One can also express the number of reactions N as the minus the rate of change of n\ with time 
times the volume V, time interval T, and momentum space volume d 3 pi as 


N{pi,P 2 ~^P 3 ,Pa) = —VTriid 3 p\. 


(18.13) 


However, we also need to allow for the possibility of inverse reactions, where particles are removed 
from modes k3 and k4 and particles are created in modes ki and k 2 . This introduces an extra term 
in the rates which has opposite sign, but has the input and output states exchanged. Both forward 
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and inverse reactions can be treated together if we replace the quantum mechanical statistical-factor 
ni?i 2 (l + n 3 )(l + ra 4 ) by 


nin 2 (l + n 3 )(l + 714) — 773714(1 + 7 ii)(l + n 2 ). 

Fermionic reactants can be included here by changing plus signs to minus signs. 

The rate of change of the mean occupation number is then given by 

El & - -A 2 / % / =5? / + ft - ft - ft) 

X (? 7 i 7 l 2 (l + 77 3 ) (1 + 774 ) - 77 3 n 4 (l + 77i)(l + 77 2 )). 

This is a remarkable and powerful equation. It is fully relativistic — since all of the factors are 
explicitly Lorentz invariant — and it is also fully quantum mechanical, since it includes e.g. the 
stimulated emission factors for bosons and Fermi-blocking for fermions. There are three similar 
equations giving the rate of change of n 2 , 773 and 774 . This was obtained for a specific interaction, but 
other reactions can be included by replacing the coupling constant A with the transition amplitude, 
which, as we have seen, is also an invariant function of the particle 4-momenta. 


(18.14) 


(18.15) 


• This is a deterministic system which can be integrated forward in time to compute the reactions 
in a gas of relativistic particles. For example, one can split momentum space up into a finite 
grid of cells and assign initial values to the occupation numbers. Then, for each cell ki one 
could loop over the subset of all triplets k 2 , k 3 , k 4 such that pi + p 2 — p 3 — pi vanishes, 
and increment or decrement ? 7 i appropriately for some small time interval At. Repeating 
this laborious, but conceptually straightforward, process gives the evolution of the occupation 
numbers 7i(p) with time. 

• The right hand side is really a 5-dimensional integral (three triple integrals containing four 
5-functions). This makes sense. To compute the rate of loss of particles from mode ki we 
need to integrate over the momenta k 2 of the other input particle. The output particles, being 
real, have six degrees of freedom, four of which are fixed by energy-momentum conservation, 
leaving two free variables to integrate out. These could be taken to be the direction of one of 
the outgoing particles, for example. 

• The output of this program is the number density of particles 77 = J d 3 p f —> f dE Epf, the 
energy density p = f dE E 2 pf : and other quantities such as the pressure, entropy etc can also 
be extracted from the final occupation numbers. 


An important and direct consequence of this expression is that in equilibrium, where the mean 
rates must all vanish, we require that the statistical factor (18.14) must vanish and this implies 
that the mean occupation numbers as a function of energy are given by the Bose-Einstein and 
Fermi-Dirac expressions for the bosons and fermions respectively. 


18.4 The Scalar Field in an Expanding Universe 

Scalar fields are cherished by cosmologists. They have been invoked as the ‘inflaton’ which drives 
inflation; scalar fields have been considered as candidates for the dark matter (the classical example 
here being the ‘axion’), and the recent evidence for an accelerating universe has rekindled interest in 
the possibility that the universal expansion has recently become controlled by another inflaton-like 
field. 

To help understand how such fields behave in a cosmological context let us reformulate the 
equations of motion in expanding coordinates. Specifically let us transform from physical spatial 
coordinates x to ‘comoving’ coordinates r defined such that x = a(t) r with a(t) the cosmological 
scale factor. 
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The action S = f d 4 x £(<j>, S7 x (p, <j>) becomes 

S = j dt J d 3 r £(</>, V r <j>,(f>,i) (18.16) 

where now the Lagrangian, in terms of these new variables, is 

C&Vrhht) = - |(V^) 2 - "</> 2 . (18.17) 

Taking the various partial derivatives and combining them in the usual way yields the Klein-Gordon 
equation in expanding coordinates 

4> + 3Hj>- \\7 2 r (p + m 2 <p = 0 (18.18) 

a z 

where H = a/a is the expansion rate. This differs most significantly from the equation in non¬ 
expanding coordinates by the inclusion of the ‘damping term’ 3 H(f> which, as its name suggests, 
causes the field amplitude to decay as the Universe expands. 

To compute how the how the field amplitude decays we proceed as follows: 

• We consider a single plane wave with (comoving) spatial frequency k and let <p( r, t) = 0o(t)e lk r . 

• We re-scale the field, and define an auxiliary variable ip(t) = a(t) 3 / 2 <po(t). 

• This re-scaling gets rid of the ‘damping term’ and the equation of motion for ip is 

ip+fl 2 (t)ip = 0. (18.19) 

• This is an oscillator equation with time varying frequency 

fi 2 (<) = k 2 /a 2 (t) + m 2 (18.20) 

where we have dropped some terms which become negligible if k 2 /a 2 + m 2 H 2 , or equiv¬ 
alently if the period of oscillation of the <p field is small compared to the age of the Universe 
t ~ l/H. 

• We then apply adiabatic invariance, which tells us that the re-scaled field amplitude varies as 
ip oc 0(<) -1/2 . 

• We divide ip by a 3 / 2 to obtain the true field amplitude <pQ. 

There are two limiting cases of interest: The first is where m k/a, or equivalently that the 
physical wavelength A ~ a/k 1/m or again equivalently A A c the Compton wavelength. We 
expect this to correspond to non-relativistic quanta. The frequency fi is then nearly constant in 
time so the amplitude of the re-scaled field is nearly constant and the amplitude of the real field 
(po decays as </>o oc a -3 / 2 . The energy density is proportional to the kinetic energy term in the 
Lagrangian density, and so the energy density of the particles decays as <p 2 ~ D?cp 2 oc 1/a 3 . This is 
just what one would expect for non-relativistic quanta — the energy of each particle is just the rest 
mass, which is constant, and the number density falls as 1/a 3 as the Universe expands. 

The other extreme is m <C k/a, corresponding to A <C A c . The frequency U then varies with the 
expansion as S2 ~ k/a oc 1/a and the re-scaled field amplitude is ip oc fU 1 / 2 a a 1 / 2 . The amplitude 
of the true field then decays as <p oc 1/a and the energy density is (p 2 ~ Ll 2 cp 2 oc 1/a 4 . However, this 
again is exactly what one would expect for relativistic quanta; the number of quanta dilutes as 1 /a 3 
as before, but each quantum is losing energy proportional to 1/a so the density falls as 1/a 4 . 

The foregoing discussion applies to spatially incoherent field configurations representing quanta 
of radiation or randomly moving thermal particles etc. The scalar field is bosonic, like the elec¬ 
tromagnetic field, and so can also appear in macroscopic configurations in a manner analogous to 
macroscopic magnetic fields. In the long wavelength limit, the field equation is (p+ 3Hcp + m 2 <p = 0. 
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This has the decaying oscillatory solutions <p oc a - 3 / 2 e lmt already discussed, provided to H , but 
at sufficiently early times or for sufficiently light fields this condition will be broken and the field 
equation becomes effectively <f> + 3H(p = 0 which admits solutions with </> constant in time. These 
nearly constant field configurations have the rather interesting property that the Lagrangian density, 
and therefore the energy density are constant. This means that such a field configuration is under 
enormous tension, since the universal expansion must be doing work at a great rate to maintain the 
constant mass-energy density. This is somewhat analogous to the tension along a static magnetic 
field, but here we have isotropic tension. Another consequence, as we shall see, is that the universe 
must expand exponentially if it is dominated by such a field, and this is the basis for inflation. 


18.5 Non-Relativistic Scalar Fields 

It is interesting to look at the evolution of the scalar field in the non-relativistic limit. This corre¬ 
sponds to long-wavelength variations such that V 2 </> <C m 2 <p (or equivalently that the wavelength 
greatly exceeds the Compton wavelength). 

Since it is the \ r2 (p term which couples the different oscillators, to a first approximation this 
means that if we start with <j>( r, t = 0) = (p o(r) then the field will just sit there and oscillate at the 
Compton frequency without spreading: <f>(r,t) ~ (f> o(r)e* mt . This becomes exact in the limit that 
A —> oo. For a large but finite wavelength disturbance we expect that the disturbance will tend to 
diffuse away from the initial location. 

To describe this mathematically let us factor out the common rapid oscillation factor e* mt and 
set 

<Xr, t) = ip(r, t)e imt + ip*(r, t)e~ imt (18.21) 

where ip(r,t) is a slowly varying field. By construction <p will be real. Taking the time derivative of 
the field gives 

</> = imipe imt - imip*e~ irnt + ipe imt + ip*e~ imt (18.22) 

where the first two terms here are much larger than the last two. Taking a further time derivative 
yields 

p = —m 2 (ipe imt + ip*e~ imt ) + 2 im{ipe irnt - ip*e~ imt ) + 0{i>e imt ). (18.23) 

The Laplacian of the field is 

V 2 </> = \7 2 ipe imt + V 2 ip*e~ imt (18.24) 

and combining these in the field equation <p — \7 2 <p + m 2 <p = 0 gives 

2 im{ipe imt - ip*e~ irnt ) - V 2 ipe imt + \7 2 ip*e~ imt = 0. (18.25) 

Since ip is supposed to be relatively slowly varying compared to e mt this requires that both the 
coefficient of e lmt and of e~ lmt must vanish, which means that 

X7 2 ip 

iip -- = 0 (18.26) 

2 TO 

which is just the Schroedinger equation. 

This is a familiar equation appearing in a perhaps unfamiliar context. More usually, this equation 
is used to describe the quantum mechanical wave function of a fermionic particle like the electron, 
whereas here it appears as the non-relativistic limit of a classical wave equation for a bosonic field! 

Returning to our beads and springs model, we have argued that if we set this oscillating with 
some long-wavelength amplitude modulation pattern then to first order the energy density (squared 
amplitude of the field) remains localized, but over long periods of time the energy will diffuse away 
from its initial location, and this is described by the Schroedinger equation. One consequence of 
this is that a non-relativistic scalar field, while being a fundamentally wave mechanical system, can 
mimic the behavior of classical non-relativistic particles by virtue of the correspondence principle. 

One can also introduce the effect of Newtonian gravity by letting the mass parameter to —> m(l + 
<I>(r)) where 4>(r) is the Newtonian gravitational potential. One then finds that the Schroedinger 
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equation contains an additional m<L(r)ip energy term. This could provide an alternative to N- 
body integration for instance. Rather than treat a set of classical particles with some given initial 
phase-space distribution function, one can set up some corresponding initial <f> field with the same 
macroscopic properties. Specifically this would mean that if one were to take a region of space which 
is small, but still much larger than the wavelength, and Fourier transform the field, the resulting 
energy density as a function of k should be proportional to /( p). The gravitational field could then 
be calculated using Poisson’s equation V 2< h = HiGp with p oc t/j 2 - This provides a pair of coupled 
equations with which we can evolve the system. 


18.6 Problems 

18.6.1 Stress-Energy Tensor 

Consider the Lagrangian density C(d </>) for a field c/)(x) (as usual denotes partial derivative 
with respect to space-time coordinates). 

a) Write down the Euler-Lagrange equation for this sytem. 

b) Use this to show that the partial derivative of the Lagrangian density with respect to space 
time coordinate x a can be written as 


dC 

d ( dC d(j> \ 

(18.27) 

dx a 

dx 11 \ dx a ) 

c) Show, thereby, that 

II 

o 

(18.28) 

where the stress-energy tensor T^ is 



rj-ifll/ _ 

-rr-.rff ft. 

d(j)^ dx a 

(18.29) 


This analysis is very similar to the analysis of energy conservation in the dynamics section of the 
notes (though technically considerably more challenging). 

Just as the equation expresses the conservation of the total electric charge Q = f d 3 x j°, 
the equation X’ M1/ „ = 0 expresses the conservation of a four-vector = J d 3 x T° M ; the total four- 
momentum of the system. See Landau and Lifshitz “Classical Theory of Fields” §32 for a detailed 
discussion. 

18.6.2 Klein-Gordon Field 

a) For the Klein-Gordon Lagrangian density 

C = - ( 18 . 30 ) 

show that the stress-energy tensor is 

T^ = d^(j)d v <j) + rf v C\ ( 18 . 31 ) 

that the time-time component of T^, which is the energy density, is 

p = T 00 = i [^> 2 + (V 0) 2 + m 2 <j> 2 }, ( 18 . 32 ) 

and that the average of the spatial diagonal components, which is the pressure, is 

P = \t u = \ - (V ^) 2 /3 - m 2 <j> 2 ]. ( 18 . 33 ) 

b) What is the relation between the pressure and density — the ‘equation of state’ — for the following 
field configurations. 
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1. A sea of incoherent random waves with k m. 

2. A sea of incoherent random waves with k <C m. 

3. A spatially uniform and static field <p = constant. 

18.6.3 Scalar Field Pressure 

Starting from the Klein-Gordon Lagrangian density (in natural units) 

C = - (V x 0) 2 - m 2 (p 2 ) (18.34) 

obtain the Euler-Lagrange equation in an expanding Universe where comoving and physical coor¬ 
dinates are related by r = x/a(t). Here V x denotes derivative with respect to physical spatial 
coordinate. 

The energy density for the field is 

P= \ 4> 2 + \ (V0) 2 + m 2 (j) 2 (18.35) 

2 I or 

where V denotes derivative with respect to comoving spatial coordinate. In the ‘beads and springs’ 
analog what do the three terms represent? 

Combine these to show that the rate of change of the density, averaged over some comoving 
volume, is 

3(</> 2 ) + 1t((V(/>) 2 ) + surface terms. (18.36) 

or 

where the surface terms become negligible if the averaging volume becomes large, and vanish if we 
impose periodic boundary conditions. 

Now the first law of thermodynamics (and E = Me 2 ) imply that for a homogeneous expanding 
Universe, the rate of change of the density is 

^ = -3 % + P)- (18-37) 

at a 

Show thereby that the mean pressure is 

(P) = \ [(tf) - <(V^) 2 }/3 - m 2 (</> 2 )] . (18.38) 

What is the equation of state (relation between P and p) for highly relativistic waves. 

18.6.4 Domain Walls and Strings 

a) Consider a scalar field </>(x, t) with ‘spontaneous symmetry breaking’ potential 

V {(j)) = constant — acf 2 + b(j) A (18.39) 

with a, b positive constants. Let the minima of the potential be V = 0 at (f> = and let V (0) = Vq. 

• Sketch the minimum energy field configuration, if we require that 4> = Pcpo as x\ —» ±oo. 

• What is the width of the domain wall? 

• What is the surface density of the wall? 

b) Now consider a two component field with the analogous ‘wine bottle bottom’ potential: 

02 ) = constant - a{(p\ + (p\) + b{(j>\ + (p \) 2 
with potential minimum V = 0 at (j)\ + (p 2 = </>q and again U(0, 0) = Vq. 


d{p) a 

dt a 


(18.40) 
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• Describe the minimum energy field configuration, if we require that (<^ 1 ,^ 2 ) ~ i> (aq, £ 2 ) / + x\ 

as x\ + —> 00 ? 

• What is the thickness of the string? 

• What is the line density of the string? 

Order of magnitude estimates are sufficient. 



Part IV 
Matter 
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Chapter 19 

Kinetic Theory 


Kinetic Theory provides the microscopic basis for gas dynamics. In the dilute gas approximation, 
such that the de Broglie wavelength is much less than the mean particle separation (which implies 
that the occupation number be small so stimulated emission and exclusion principle are unimportant) 
we can consider the gas to be a set of classical particles moving ballistically with occasional collisions. 

The collisions are described by the cross-section o which gives the rate of collisions and the 
distribution of deflection angles for the particles. This may be computed quantum mechanically 
or determined experimentally and is assumed here to be a given function of deflection angle and 
relative velocity. We shall also focus on non-relativistic gases such that kT <C me 2 . 

We shall derive the Boltzmann equation which describes the evolution of the phase-space density 
/(r,v) for atoms in the gas, first for a collisionless gas and then study the effect of collisions to 
obtain the Boltzmann transport equation. We discuss equilibrium solutions of this equation, and 
Boltzmann’s H-theorem. We obtain the ideal gas laws as a limiting idealization in the case of short 
mean free path, and discuss qualitatively the effects of viscosity and heat conduction. 


19.1 The Collisionless Boltzmann Equation 

Consider a 6-dimensional volume d 6 w = d 3 rd 3 v of phase space (we shall assume equal mass particles, 
so velocity and momentum are effectively equivalent). Imagine the volumes to be small compared 
to the scale over which macroscopic conditions vary but still large enough to contain a huge number 
of particles. 

The number of particles in d e w is f(w)d e w. The rate of change of this number is the integral of 
the flux of particles across the boundary of d 6 w. For a finite volume w we have 

j t J d 6 wf = ~JdS n • (/w) (19.1) 

w S 

with S the surface of w and n its normal. The 6-dimensional divergence theorem relates such a surface 
integral to the volume integral of the 6-dimensional divergence J dS n • (/w) = J d 6 w;V • (/w) and 
therefore 

d 6 w 

from which follows the 6-dimensional continuity equation 

+ V • (/w) = 0. (19.3) 

This result can also be obtained by considering the flux of particles through the sides of a small 
6-cube. 


df 

dt 


+ V • (/w) 


= 0 


(19.2) 
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Writing this out in more detail 


oi 

dt 


+E (/*<)+E tLv**) = 0 


2=1 


2=1 


dvi 


(19.4) 


dt dx <9v 7 ^ldz i <%J 

2 ' x 


(19.5) 


but using Hamilton’s equations Xi = OH/dpi and pi — dH/dxi with p = mv shows that the right 
hand side vanishes identically. The left hand side is the convective derivative of the phase-space 
density; it gives the rate of change of / as seen by an atom moving with 6-velocity w = (x, v). We 
denote this by df/dt = Lf where the differential Liouville operator is L = d/dt + x • d/0x + v • d/dv 
and we obtain the collisionless Boltzmann equation, or Liouville’s equation 


dt 


Lf = 



+ x • V x + v • 



/ = o. 


(19.6) 


This equation says that the phase-space density behaves like an incompressible fluid; the phase- 
space density remains constant along the path of a particle. The space-density of particles may 
vary, but it must be accompanied by a change in the momentum-space density so that the product 
n(x)n(v) remains constant. The phase-space density in the vicinity of different particles will, in 
general, differ. 

This should not be confused with the invariance of /(x,v) under Lorentz boosts noted earlier. It 
is however intimately related to the adiabatic invariance of § dp dq. Consider a collection of particles 
oscillating in a pig-trough. These particles will fill some region of the 2-dimensional phase space. 
If we change the energy or the profile of the trough, this will change the boundary of the region of 
phase-space they occupy, but the density of each volume of this ‘fluid’ remains constant, so the total 
area of this region remains fixed. 


19.2 The Boltzmann Transport Equation 


To describe the effect of collisions we augment the collisionless Boltzmann equation with a collision 
term 

0 . _ . . _ \ „ (df 


at 




(19.7) 


coll 


The instantaneous rate of change of /(x, v) due to collisions involves collisions which scatter particles 
out of this region of phase space and collisions which scatter particles in. We define the loss rate R 
such that Rd 3 xd 3 v is the number of collisions per unit time where one of the initial particles is in 
d 3 xd 3 v and the gain rate R so that Rd 3 xd 3 v is the number of collisions per unit time where one of 
the dual particles is in d 3 xd 3 v so the collision term can be written as 


) Std 3 xd 3 v = IVgain — N\ oas = ( R. — R)5td 3 xd 3 


(19.8) 


coll 


Let us consider exclusively 2-body collisions of equal mass atoms which scatter particles labeled 

To obtain the loss rate from 


i) 


as illustrated in figure 19.1 


1, 2 from velocities Vi, V 2 to v 
particles with some specific velocity v 1 we need to integrate over all possible velocities v 2 for the 
other colliding particle. Once Vi and v 2 are specified, this leaves the six variables v',, V 2 to be 
determined. Momentum and energy conservation impose four constraints, leaving two variables to 
fully determine the collision. We can take these to be the direction O = of particle 1 after the 
collision. We define the differential cross-section tj(fl) such that the number of collisions per unit 
time per unit spatial volume between particles in streams with space densities n\ = f(vi)d 3 vi and 
n 2 = f(v 2 )d 3 V 2 and in which particle 1 is deflected into direction dO is 


dN 

dtd 3 x 


= nin 2 |v! — v 2 |(j(fl)dO = d 3 'Cid 3 v 2 /(v 1 )/(v 2 ) jvj — v 2 |cr(f!)dfL 


(19.9) 
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Figure 19.1: Illustration of a collision between two particles where initial particles are in momentum 
space cells labeled vi, V 2 and end up in cells labeled v^, V 2 '. In calculating the rate of change of 
occupation number for cell Vi say, we have a negative term corresponding to the ‘forward’ reactions 
as shown, but we also have a positive term arising from ‘inverse’ reactions. For interaction processes 
of most interest, the ‘microscopic rates’ for the forward and inverse reactions are the same, and the 
actual rates are then just proportional to / 1/2 and f[f 2 respectively. 


Integrating this over all v 2 and gives the loss term 

Rd 3 vi = d 3 vif(vi) j d 3 v 2 j dfi er(f2)|vi — v 2 |/(v 2 ). (19.10) 

The gain term is trickier, as we need to consider the inverse collisions vj , v 2 — > Vi, V2 where 
one of the particles ends up in d 3 v\ and integrate over all possible values of vj, v 2 and V 2 consistent 
with energy and momentum conservation. However, we know from field theory that the net rate of 
reactions scattering particles out of state V! (ie including backward reactions as a negative rate) is 

i?(vi,v 2 —> v^vj) oc d 3 vi d 3 v 2 d 3 v[ d 3 v' 2 S^(pi + p 2 — -p' 2 )(?r' 1 n , 2 - nin 2 )\T(pi)\ 2 . (19.11) 

The transition amplitude T is an invariant function of its arguments, and for electromagnetic inter¬ 
actions is symmetric under exchange of initial and final states. Thus for real reactions we expect 
the rates to be symmetric with respect to forward and backward collisions, and so we have for the 
total collision term 



d 3 v 2 J dCl ct(H)|vi - v 2 |(/(/ 2 -/i/ 2 ) 


(19.12) 


where f[ = /(vj) etc and where in the integral on the right v-| is assumed to be fixed, and vj, v 2 
are considered functions of vi, V 2 and fi. Also, it should be noted that cr(fl) is really also a function 
of the relative speed of the particles. 
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19.3 Applications of the Transport Equation 


19.3.1 Equilibrium Solutions 


In equilibrium, the collision term (19.121 must vanish, which implies 

f[f2 - hh = o 


This can be thought of as a conservation law 

log f[ + log f 2 = log /i + log f 2 


(19.13) 


(19.14) 


Now we also have conservation of energy, or 

K) 2 + W 2) 2 =vl+vl (19.15) 

The equilibrium phase space distribution function is some function only of the speed f = /(|v|), 
which is compatible with the above conservation laws if 

log f = a ~bv 2 . (19.16) 


The coefficient b can be related to the temperature using the Boltzmann formula. Since the 
kinetic energy is E = mv 2 /2 and / measures the probability that a cell of phase-space is occupied 
we must have 


/( v i) 

/( v 2 ) 


exp {-b{v\ - vi)) = exp(-/?(£i - E 2 )) 


(19.17) 


so b = f3m /2 = m/(2kT). 

The coefficient a is a normalization factor and is fixed once we specify the number density of 
particles 


n = 



(19.18) 


and performing the integration gives 


/(v) 


_ ™ _ -\mv 2 /kT 

( 2nkT/M )3/2 


(19.19) 


One can thereby show that the mean kinetic energy per particle 

_ / d 3 v f(v)mv 2 /2 

Jd 3 vf ~2 


(19.20) 


If we include the ‘quantum mechanical’ factors for the final states then the condition for equilib¬ 
rium becomes 

/(/'(l ± h 3 f i)(l ± h 3 f 2 ) - /r/ 2 ( 1 ± h 3 f[){ 1 ± h 3 ^) = 0 (19.21) 

and one then obtains the Fermi-Dirac and Bose-Einstein distributions. One can similarly derive 
other thermodynamic properties such as the pressure and the entropy density s, the latter being 
given, aside from a constant, by 


s = -- [ d 3 v f{v) log h 3 f(v) = -(log h 3 f) 
n J 


(19.22) 


since one can easily show that with the equilibrium distribution function this integral is 


p 

/ d 3 v f(v) log h 3 f(v) = — log (kT/n 2 ^ 3 ) + constant 


(19.23) 
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19.3.2 Boltzmann’s //-Theorem 


Consider a gas of particles with uniform space density (so V x / = 0) and with no external force 
acting (so v • V v / = 0 also), so the Liouville operator is L = d/dt and Liouville’s equation becomes 


dj_ = (df\ 

dt \ dt J coll' 

Let us define Boltzmann’s //-function 

H(t) = J d 3 v f(v,t) log/(v,t). 


(19.24) 


(19.25) 


Where, for this to make sense, / must be understood to be the numerical value of / for some 
particular choice of units, rather than a dimensional quantity. Also, / must really be considered to 
be the average of the occupation number n over a large number of fundamental cells of size h 3 in 
phase-space, since the actual occupation numbers are mostly zero in the dilute gas approximation. 

The time derivative of H is 


dH 

dt 


d 3 Vl 


dfi 

dt 


(1 + log/i) 


or, using (19.12), 


dH 

dt 



d 3 v 2 


dfl o-(Q)|vi - v 2 1(/ 1/2 - / 1 / 2 KI + log/i). 


(19.26) 


(19.27) 


Now this 8-dimensional integral is really just a way of enumerating all the possible reaction paths, 
and could have been written as 

^ oc J d 3 v 1 J d 3 v 2 J d 3 v[ j d 3 v' 2 * (4) (pi + - A - A)(/^ - /i/ 9 )|T(ft)| 2 (l -hlog/i) (19.28) 

which, aside from the last factor, is symmetric under exchanging particle labels, except for a factor 
—1 if we switch final and initial state labels. This gives us four equivalent statements for dH/dt 
which we can average to obtain 

d ^r = \ /d 3 Ui/ 3 U 2 d,^)|vj-v'|( / ( / '-/ 1 / 2 )(log/i/ 2 -log/(/'). (19.29) 

Now since log is a monotonically increasing function of its argument the last two factors here have 
opposite sign and so we obtain the //-theorem 

f * °- (19 - 30) 

Since, as we have seen, the quantity H{t) is proportional to (minus) the entropy, we see that the 
statistical mechanical entropy can never decrease. Also, the condition that the entropy should be 
constant is the same as the condition for equilibrium. Therefore, if /(v) differs from the Maxwellian 
form, the entropy must necessarily increase until the phase-space density becomes Maxwellian. 

We can perhaps see this more clearly if we consider a single reaction path Vi,v 2 «-> vjjVj as in 
figure |l9T) If the phase-space density for cell V\ changes by an amount A/i then the contribution to 
the //-function from that cell changes by an amount A/Zi = d 3 V\Afi log f\. Now d 3 vf is the space 
number density of those particles falling in this cell, so if we take the total volume of space to be 
unity, for a single forward interaction we have d 3 V\fi = d 3 v 2 f 2 = —1 and d 3 vi'fi' = d 3 v 2 > / 2 < = +1 
so the total change in H for a single forward reaction is 


AH = - log A - log f 2 + log f v + log f 2 > = log f v f 2 > - log /i/ 2 . (19.31) 


However, the mean net rate of forward reactions (i.e. the mean number of forward reactions minus 
the mean number of backwards reactions) is proportional to f\f 2 — fvf 2 ' so we reach the conclusion 
that, on average, collisions will cause H to decrease (assuming the gas is not in equilibrium). This 
is true regardless of which reaction pathway we consider so dH/dt must be non positive. 
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19.4 Conserved Quantities 

If some quantity x( v ) is conserved in collisions, ie 

xOi) + x(v 2 ) = xK) + x(v' 2 ), 

then 


d 3 


•’* (v) duL=°- 


d 3 v x(v) 


df 

dt 



(19.32) 


(19.33) 

this as 


- hh) 

(19.34) 


coll 


and interchanging 1 <-> V etc as before we can write this as 

J d 3 v x(v) =\J d 3 v 2 dfl o-(fi)|vi - v 2 |(xi + X '2 - xi - X 2 X/ 1/2 - hh) (19.35) 


which vanishes by ( |19.32 ). 

Using Lf = {df /dt ) co n this can be written as 


d\ 


X(v) 




0 


\dt dx <9x dv , 

where we have set v = — d^/dx, with 4) the gravitational potential, or as 


il d ’ vxl+ L 


d 3 v xv f - 


<94> 

<9x 


■fv( d dM-j d f ]= o 

\ av av 


(19.36) 


(19.37) 


where we have used \df/dv = d{fx)/dv — fd\/dv. Now the first term in the last integral in (19.37 ) 
vanishes since f{v) —► 0 as v —* oo. Defining the velocity average of a function Q as 


(19.371 becomes 




dn(x) dn(xv) / d\\ 

-RC 1 - + a + ' ( IT > = °- 

at ox ox \ av, 


(19.38) 

(19.39) 


There are 5 dynamical quantities which are conserved in collisions: The mass of the particles m, 
the three components of the momentum mv, and the energy mv 2 / 2. Substituting these in turn for 
X in (19.391 yields a set of 5 useful conservation laws. 


19.4.1 Mass Conservation 


If we set x = itt- in (19.39 ) and define the mass density p = nm then the last term vanishes (since m 
is independent of v) and we obtain the continuity equation 


f + V.(,u) = 0 

where we have defined the mean velocity (or ‘streaming velocity’) as 

/ d 3 v /v 


u = v = 


Since V ■ (pu) = pV • u + u • Vp we can write (19. 


/ d 3 v f 
as 


D P ^ 

Di = -> ,VU 


(19.40) 


(19.41) 


(19.42) 


where we have defined the Lagiangian derivative D/Dt = d/dt + u • V so DQ/Dt is the rate of 
change of Q as seen by an observer moving at the streaming velocity. 
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19.4.2 Momentum Conservation 


Setting %(v) = mv in (19.39) yields 


dpuj dp(vjVk) d$ = 
dt dxk ^ dxi 


(19.43) 


If we let v = u + w, so w is the ‘random’ motion of the particle relative to the local mean velocity 
u, and define p(wiWk) = PSik — i^ik where 


is the ‘gas pressure’ and 


p = p(I w I 2 )/3 


TTjfc = p(|w| 2 (5 lfc /3 - WiW k ) 


(19.44) 


(19.45) 


is the ‘viscous stress-tensor’. What we are doing here is decomposing the velocity dispersion tensor 
(WiWk) into a diagonal part describing isotropic kinetic pressure and the traceless part 7r^. Note 
that 7 Tik is symmetric and therefore has 5 degrees of freedom. 


The momentum conservation law (19.431 then becomes 


d d 94* 

m ipUi) + W k i(miUt + - *“> = -"w, 


and if we combine this with (19.40) one obtains the ‘force equation’ 


P~jy^ = —pV4> — VP+ V • 7T 


where (V ■ 7r)* = dTT ik /dxk. 


19.4.3 Energy Conservation 

Setting 


y(v) = -tod 2 = 2, mu2 mW U ~^ 2 mu>2 


in (19.391 yields the energy conservation law 




d 


dxk L2 


^((u k + U7 fc )|u + wj : 


0<I> 

Py — u k = 0. 

dxk 


Now 


(( u k + w k )(ui + Wi)(ui + Wi )) = u 2 u k + 2 Ui(wiW k ) + u k (w 2 ) + ( w k w 2 ) 
and defining the specific internal energy e by 

pe = p(u> 2 / 2) = 3P/2 

and the conduction heat flux F 

F k = -p(w fc w 2 ) 


(19.49) gives the total energy equation 
d (1 


dt V 2 


d 


— ( -pu 2 + pe) + ( -pu~u k + Ui(PSik - 7 T ik ) + peu k + F k ) = ~puk~^yy 


d<f> 


dxi 


(19.46) 


(19.47) 


(19.48) 

(19.49) 

(19.50) 

(19.51) 

(19.52) 

(19.53) 


which is a continuity equation where the first term is the time derivative of the total energy density, 
the second term is the divergence of the energy flux and the term on the right hand side is the rate 
at which external forces are doing work. 
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Using (19.47 ) this can also be recast as the internal energy equation 


d(pe) + d{peu k ) = p duk _ dE\ + ^ 
dt dxk dxk dxk 

where we have defined the rate of viscous dissipation as 


4 - 



1 f du^ du k \ 

2 nik \d^ + d^J ' 


(19.54) 


(19.55) 


Finally, using the equation of continuity this becomes an expression of the first law of thermody¬ 
namics 

p^-=PV- u^V-F + T (19.56) 

where now the external potential 4) and the bulk motion u no longer appear. This says that the 
rate of change of internal energy of a parcel of fluid comes from a combination of PdV work, heat 
conduction and dissipation of anisotropic shear. 


19.5 Fluid Equations 


To recapitulate, taking moments of the Boltzmann transport equation have yielded 

(m + u - v)p = -pV-u 

p (J( + u • V) u = — pV4> — VP + V • 7T (19.57) 

p (§- t + u • V) e = -PV • u - V • F + T 


with constituent relations 

p = m J d 3 vf(x, v, t ) 
u = (v) 

e=§ P/p=> 2 > 

TTjfe = p(|w| 2 (i ifc /3 - WiWk) 
Fk = \p{w k w 2 ) 


(19.58) 


this gives 5 equations, but unfortunately there are 13 unknowns these are p, e, and the three 
components of u plus the three components of the heat flux F and the 5 components of the viscous 
stress tensor 7(which is a trace-free symmetric tensor). 

However, in the limit that the mean free path A is small compared to the scales L over which 
conditions are varying progress can be made by making an expansion in powers of X/L. In the 
limit A —> 0 collisions will be very effective and will force the velocity distribution to become locally 
Maxwellian, on a short time-scale on the order of the collision time. Inhomogeneities on large scales 
though will take much longer to damp out. Now for a Maxwellian, the random velocity distribution 
is isotropic, so the viscous stress and the heat flux vanish. If we simply set F = 0 and itik = 0 
then one ends up with a set of 5 equations for the 5 unknowns p , e and u. These are known as the 
ideal fluid equations, and the momentum equation is known as the Euler equation. Taking the ideal 
fluid as a zeroth order solution, one can then compute the small anisotropy of the random velocity 
distribution function to first order in the mean free path. For instance, if there is velocity shear, 
then this will drive an anisotropy of the random velocity dispersion (iXiLOj) resulting in a viscous 
stress 7r ~ AVu/(w 2 ) 1|/2 . Similarly, of there is a temperature gradient in the zeroth order solution 
then this will give rise to a heat flux F etc. Including the effects of viscosity and heat conduction 
result in the Navier-Stokes equations. 


19.6 Problems 

19.6.1 Boltzmann distribution 

According to elementary field theory, and in the ‘dilute gas’ approximation, the net rate for reactions 
Pn ■ ■ ■PiN <-> pfi... PfM with N incoming particles and M outgoing particles (or vice versa for the 
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inverse reaction) contains a factor 


(fn ■ ■ ■ fm(l ± fs 1 ) • • • (1 ± //m) — // i • • • //m( 1 ± /»i) • • • (1 ± /»iv)) (19.59) 

where ± signs apply for outgoing bosons and fermions respectively. This expression says that the 
rate of reactions depends on the product of the densities of incoming particles, as expected classically, 
and the 1 ± / factors associated with the outgoing particles express quantum mechanical corrections 
for stimulated emission (for bosons) and the exclusion principle for fermions. 

a. For a gas of particles which have isotropically distributed momenta in the lab frame the oc¬ 
cupation number f is a function only of the energy of the particle. Show that the equilibrium 
distribution function for the species i is is _/)(£)) = (e^' E<+a< ^ =F l) -1 where [3 is the inverse 
temperature and cn is chemical potential for species i and the sum of the chemical potentials 
for the incoming particles is equal to the sum for the outgoing particles. 

b. Show that for a reaction Li,L 2 <-> L\,L'^B (i.e. two L-particles scattering off each other, 
creating in the process a bosonic B particle) that as = 0 and that the bosons must therefore 
have a Planckian distribution. 

19.6.2 Kinetic theory and entropy 

Write down expressions for the energy U and pressure P of an ideal monatomic gas of N particles 
at temperature T in a volume V. Use these relations and the first law of thermodynamics dU = 
TdS — PdV to show that the entropy is given by S = Nk(3/2lnT + In V) + constant. 

Now consider the statistical mechanical definition of the entropy density s = —k J d 3 vf In / where 
/(v) is the phase space distribution function. For a Maxwellian distribution / = Aexp—mv 2 /2kT, 
compute the normalisation factor A in terms of the temperature T and the space density of par¬ 
ticles n, and compute the total statistical mechanical entropy S = f d 3 x s and compare with the 
thermodynamic result. 

The collision term in the Boltzmann transport equation is 

( d^T^ ) = / d3 ^ 2 / dfi ^l Vl_V2 l^( v i^( v2, ) _ ^( Vl ^( V2 )) (19.60) 

Show that for a Maxwellian, (df /dt) c = 0. 

19.6.3 Kinetic theory 

Compare the de Broglie wavelength and the mean separation of air molecules at atmospheric pressure; 
discuss the validity of a description of such a gas as a collection of effectively distinguishable particles 
following classical trajectories interrupted by brief collision events. 

Estimate the thermal speed for air molecules at room temperature. Estimate the mean free path 
and collision time assuming a collision cross section cr ~ 10 -16 cm 2 . 

Estimate the thermal velocities of electrons and ions in ionised gas at T ~ 10 4 Ab Estimate the 
mean free time between hard electron-electron collisions (i.e. collisions which change the direction 
of the electron’s motion substantially) and the corresponding mean free path assuming a density of 
n e ~ lcm -3 . 

19.6.4 Massive neutrinos 

According to the standard big bang model the Universe is suffused with a thermal gas of neutrinos 
which, at high redshift has a number density and and temperature essentially the same as the 
photons which now comprise the 3K microwave background. These neutrinos, if they had a mass on 
the order of ten electron volts, could provide the ‘missing mass’ inferred from dynamics of clusters 
of galaxies etc. 
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a) Estimate the redshift at which these neutrinos became non-relativistic (i.e. the epoch when a 
MBR photon had a typical energy ~ 10 eV) and estimate their phase space density at that time. 

b) What does the collisionless Boltzmann equation tell us about the subsequent evolution of their 
phase space density? 

c) Use this to derive, to order of magnitude, the ‘Tremaine-Gunn’ bound in the size - velocity 
dispersion plane for structures which are gravitationally bound by these particles. 

d) Compare this limit with cores of galaxy clusters (<r ~ 10 3 km/s and R ~ 0.25Mpc. 



Chapter 20 

Ideal Fluids 


Setting F = 0 and Hih = 0 in (19.57) yields the ideal fluid equations 


p— 
e D 


I5? = -pV-u 
= - VP 

— = —-eV • u 

Dt 3 v 


( 20 . 1 ) 


These are known as the continuity equation, Euler’s equation, and the energy equation. 


20.1 Adiabatic Flows 


If one multiplies the energy equation in (20.11 by 3p/2e and subtracts the continuity equation one 
obtains 


or equivalently 


3 pDe _Dp = Q 
2 e Dt Dt 


P(^) = o 


( 20 . 2 ) 

(20.3) 


which tells us that along a streamline the specific energy and the mass density are related by 

e oc p 2/3 (20.4) 


This makes physical sense. Each particle is bouncing off other particles. If there is a net contraction, 
so V • u < 0 in some region, then a particle will tend to gain energy as it bounces off particles moving 
inwards. 

This is much like what happens to a particle in a box with reflecting walls which are changing 
with time so that L = L{t). In each reflection off a wall perpendicular to the x-axis the x-component 
of the velocity changes by Av x = —L and such reflections occur once per time At = L/v x , and so 
the rate of change of the velocity is 


dv 

dt 


Av 
A t 


vL 

~L 


dv 

v 


dL 

~L 


(20.5) 


with solution v(t) otl/L. Since the dens ity of particles scales as p oc 1/L 3 the thermal energy scales 
as e oc v 2 oc 1/L 2 oc p 2 / 3 in accord with (20.41. 

Another line of argument that leads to the same result is to consider standing de Broglie waves 
in a cavity. 


Equation (20.41 is equivalent to constancy of the entropy. The first law of thermodynamics is 


dU = TdS — PdV, so for constant S, dU = —PdV , but U = Me, V = M/p where M is the mass 
of gas, so Aide = AIPdp~ l which, with P = 2/3ep, gives de/e = ( 2/3)dp/p which again implies 
e oc p 2 / 3 . 
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20.2 Hydrostatic Equilibrium 

Setting Du/Dt = 0 in Euler’s equation gives the equation of hydrostatic equilibrium 

VP = -pV4> (20.6) 


with <f> the gravitational potential. 

For a stratified medium this becomes 


dP 

dz 


= pg 


with g = — V4> the gravity. 

More generally the density and potential are related by Poisson’s equation 

V 2 4> = 4nGp 


and combining this with (20.61 gives 


1 . 


V ■ -VP = -V$ = -4 ?xGp 


or, for a spherically symmetric system 

1 d f r 2 dP\ 


r 2 dr \ p dr J 4llGp ' 


(20.7) 


( 20 . 8 ) 


(20.9) 


( 20 . 10 ) 


With an assumed equation of state P = P(p) for instance, (20.101 can be integrated to give 
P(r), p(r) etc. 


20.3 Convective Stability 


Solutions of (20.101 are mechanically stable but may be convectively unstable. To determine whether 


a solution p(z), P{z) is convectively stable, consider a blob of gas, and imagine displacing it upwards 
by an amount A z. After rising, the pressure falls by an amount AP = —A zS7P and it will expand 
(adiabatically) with P oc p 5 / 3 and the fractional change in density will be A p/p = (3/5)A P/P, 
whereas the fractional change in the ambient density is A p/p = AzVp/p. If, after rising, it finds 
itself more buoyant than its surroundings (ie lower density) then it will continue to rise and is clearly 
unstable. It is not difficult to see that the condition for stability is that the entropy of the gas should 
increase with height, or equivalently dlogP/dlogp < 5/3. 


20.4 Bernoulli’s Equation 

Bernoulli’s Equation applies to steady flows for which du/dt = 0 (this should not be confused with 
hydrostatic equilibrium, for which Du/Dt = 0). Euler’s equation then becomes 

(u • V)u = —V4> - VP/p (20.11) 

but (u • V)u = | Vm 2 — u x (V x u) so this is 

^Vit 2 + (V x u) x u + V4> + VP/p = 0. (20.12) 

Now under various conditions, the last term here is a total gradient. One such situation is if the 
flow is adiabatic (ie isentropic) in which case VP/p is the gradient of the specific enthalpy. The total 
enthalpy H is defined to be H = U+PV , and its derivative is dH = dU+PdV+VdP = TdS+VdP , 
and dividing by the mass gives dh = Tds + dP/p. Thus, for adiabatic gas, VP/p = V/r, where 


h = e + P/p 


( 20 . 13 ) 
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Figure 20.1: Illustration of the Bernoulli effect. An ideal fluid moves through a venturi tube. 
Constancy of flux means the velocity must be highest at the waist of the tube. In a steady state 
this requires that the fluid be accelerated, which requires a pressure gradient so that the pressure 
be lowest where the velocity is highest. 


is the specific enthalpy. The quantity VP/p is also a total gradient for any barytropic equation of 
state P = P(p), and we shall assume this is the case. 

If we dot equation (20.121 above with the unit vector u the term involving V x u drops out, and 
noting that u • V is the derivative along the direction of motion, which we will write as d/dl, we 
have 


d 

Jl 



= 0 . 


(20.14) 


Now the specific enthalpy is just proportional to the pressure, which means that, if we neglect 
the effect of gravity, an increase in fluid or gas velocity must be accompanied by a drop of pressure. 
This is called the Bernoulli effect and is used in the carburetor where the air is channelled through a 
‘venturi’ or constricting nozzle. Since the flux of gas is fixed, this requires that the gas velocity scale 
inversely as the cross-sectional area, resulting in a drop in pressure in the nozzle. This pressure drop 
causes fuel to be sucked through the jet into the air-stream. The phenomenon is also said to occur 
when ships travel parallel to one another. Here the water is forced to flow faster between the ships, 
causing a drop in pressure which causes the ships to be attracted to one another. This also explains 
why if one opens a car window a crack, air will be sucked out. Finally, it can be used effectively to 
separate pages of a book or newspaper simply by blowing past the leaves to cause a drop in pressure. 


20.5 Kelvin’s Circulation Theorem 


We can write the Euler equation as 


<9u 


al + v| =“ 


i 


(V x u) x u = —V<I> — VP/p. 


(20.15) 


If we take the curl of this, and assume adiabaticity, so VP/p = V/i, or more generally assume a 
barytropic equation of state P = P(p), then all but the first and third terms on the left hand side 
drop out. Defining the vorticity u> = V x u we obtain 


du 

dt 


+ V x (uj x u) = 0. 


(20.16) 
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Figure 20.2: Kelvin’s circulation theorem concerns the rate of change of T, the integral of the vorticity 
over a surface, as the boundary of the surface moves with the fluid flow. This figure illustrates that 
the change in the surface as it moves a distance u dt can be considered to be the ribbon-like strip 
connecting the old loop to the new loop. 


Now consider the circulation 

r = u ■ di 

which is defined for a closed loop. Using Stokes’ theorem this can be expressed as 

r= / dA (V x u) • n = / dA jj ■ n = / dA ■ u> 


(20.17) 

(20.18) 


How does r change with time for a loop which moves with the fluid? The value of T for a given 
loop does not depend on the surface that one chooses. The simplest way to compute the change in 
r as we move the loop a small distance u dt along the flow is to take the new surface to be the old 
surface plus a ribbon like wall which connects the old loop to the new loop as illustrated in figure 


20.2 The change in T will then consist of two terms; the change in T for the old surface if the 
velocity field has changed with time plus the contribution from the ribbon: 


<5T = St 


dA 

dt 


S A ■ u). 


The element of surface of the ribbon shown in figure |20.2| is 

SA = dl x u St 


(20.19) 


( 20 . 20 ) 


so in the second integral SA ■ uj = St (dl x u) • u = Jf dl • (w x u) (since a • (b x c) is the volume 
of the parallelepiped with edges a, b, c, which is independent of the order of the vectors) and so 
dividing through by St gives 


dT /' JA duj /■ . 

* = / dA • ar + / dl ■ 

Using Stokes’ theorem to convert the loop integral to a surface integral gives 


dA- 

dt I 


()(jJ 

— + V x (u> x u) 


( 20 . 21 ) 


( 20 . 22 ) 


but by (20.161 the integrand vanishes and we have Kelvin’s circulation theorem 

dr 


dt 


= 0 


which tells us that the circulation is conserved. 


(20.23) 
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20.6 Potential Flows 


Consider a stream of ideal fluid and assume that the vorticity at some point r upstream vanishes. 


Now according to (20.161, the convective derivative of the vorticity is 


Duj du} 

-= ——I- (u • V)<*> = V x (u: x u) + (u ■ V)w 

Dt ot 


(20.24) 


which implies that if u) = 0 initially (ie at r) then it will stay that way and must vanish at all points 
on the streamline passing through r. 

One can reach the same conclusion using Kelvin’s circulation theorem for loops of various orien¬ 
tation in the vicinity of r (which have T = 0) and following them downstream. 

If the vorticity vanishes at all points upstream (as is the case if one has laminar flow, for instance) 
then the vorticity must vanish everywhere. If w = V x u this means that the flow velocity must be 
the gradient of some scalar function 4', known as the velocity potential: 


u (r, f) = V$( r,t). 


(20.25) 


A general flow has three degrees of freedom at each point in space, whereas a potential flow has 
only one since is is derived from a single scalar function 4'. One way to see this is to write the flow 
as a Fourier synthesis 

u(r,t) = ^Uke jkr . (20.26) 

k 

For a general flow we need to specify three values Uk for each k whereas for a potential flow Uk = 

ikl'k- 

Now, neglecting gravity and assuming adiabaticity (so VP/p = V/i) Euler’s equation is 

^ + ^Vii 2 + uxu=-Vli (20.27) 

but for a potential flow, u} = 0, and du/dt = d(V4')/i dt, and so we have 


V 




= 0 


(20.28) 


implying 

— +-u 2 + h = f(t). (20.29) 

In fact, we can always set / = 0 without loss of generality since one can always make the trans¬ 
formation 4> —> 'F' = 4r + F(t) for arbitrary F(t) without changing the flow u = V4> and one can 
therefore choose F(t) such that dF/dt = f. 

For a steady flow d’l’/dt = 0 and f(t) =constant and we have 



constant. 


(20.30) 


It is interesting to compare this with Bernoulli’s equation (20.14) which looks very similar. However, 
the latter states that u 2 /2 + h is constant along streamlines , and allows the possibility that the 
constant is different for different streamlines. Equation (20.301, in contrast, is more restrictive and 
states that for a steady potential flow u 2 /2 + h is the same for all streamlines. 


20.7 Incompressible Potential Flows 

Potential flows have V x u = 0, implying u = V4>. If in addition we require that the flow be 
incompressible , or approximately so, then the continuity equation Dp/Dt = — pV • u implies that 
V • u = 0 also, and therefore 


V 2 4> = 0. 


(20.31) 
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Thus the velocity potential is the solution of Poisson’s equation with vanishing source term. 

An example is T = — l/|r| for r / 0, which gives a velocity u = r/r 2 which is the 1/r 2 mass 
conserving flow one would expect to find for a constant source of fluid injected at r = 0. 

In this context the Euler equation (now including the gravitational potential •!>) is 


, 0 * 1 , 

at 2 


V$ + V(P/p) = 0 


or 


du 2 

^ + y + * + p/p = o 


(20.32) 


(20.33) 


where we have assumed an appropriate F(t) in $ to set the rig ht hand side to zero. 

If the external pressure on some boundary is specified then (20.331 (with u 2 = (VT) 2 ) supplies 


the boundary condition for (20.311. 


Note that Bernoulli’s equation for incompressible potential flows is 


*u 2 + $ + P/p = 0 (20.34) 

or, neglecting gravity, 

^u 2 + P/p = 0 (20.35) 

which provides a direct relationship between flow velocity and pressure. If we consider an obstruction 
in a potential flow this implies that the pressure will be the greatest at the stagnation points on the 
surface of the obstruction where u = 0. This also allows one to see how an aerofoil can provide lift if 
it is shaped such that the fluid must flow faster over the upper surface than over the lower surface. 


20.8 Gravity Waves 


An interesting application of potential flow theory is provided by gravity waves such as occur in the 
ocean or in atmospheres of stars and planets. 

For concreteness, let us consider waves in a bath of incompressible fluid with negligible external 
pressure. 

One can easily derive the main features of waves in such a system from order of magnitude 
arguments. These reveal an important distinction between the cases when the bath is deep or 
shallow as compared to the wavelength of the wave. In the former case, if we displace some fluid so 
that the surface in a region of size L is raised by an amount h 7 then we expect the fluid to respond 
by sinking back causing a flow extending to depth ~ L below the surface. The mass of fluid involved 
in the flow is therefore M ~ pL 3 and the gravitational force is F ~ L 2 gph. Setting F = Ma = —Mh 
gives 




(20.36) 


which is a simple harmonic oscillator with frequency 


uj 2 ~ g/L. 


(20.37) 


One can read from this that gravity waves in a deep body of fluid are dispersive, since u> ~ \fk 
(with k ~ 2tt/X the wave-number) and therefore phase and group velocities u p hase = w/fe and 
I'group = duj/dk have non-trivial dependence on wavelength. Also, one can see that the period of 
gravity waves of length-scale L is on the order of the period of a pendulum of that length. 

One can perform a similar argument for a shallow bath of depth D <C A, in which case the mass 
of fluid involved is M ~ pL 2 D whereas the restoring force is the same as before F ~ L 2 gph and one 
obtains an equation of motion 



(20.38) 


which is a non-dispersive wave equation. 
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Figure 20.3: Geometry for the calculation of gravity wave dispersion relation. 


Now let’s see how this can be made more precise. We start with the Euler equation as above, 
replace $ —» gz , and linearize in the amplitude of the waves. This means we drop the u 2 term since 


it is of second order. Letting £ denote the displacement of the surface (see figure 20.31, and with 


zero (or negligible) external pressure, Euler’s equation provides the boundary condition 

dV\ 

»)„ £ +9C 


(20.39) 


or, more usefully, taking the time derivative of this and using £ = u z = <9*F /dz, this is 


dT <9 2 T\ 

9 ^ + 


= 0. 


(20.40) 


This condition must be satisfied on the surface z — £. However, since both T and £ are first order 
quantities, we can effectively apply the above condition at z = 0 and make only an error of second 
order. The pair of equations we need to solve are 


V 2 >F = 0 

( d* , d 2 *\ _ n (20.41) 

[9 dz + Oti) z=0 - U 

As usual, we want to look for traveling wave solutions T = f(z)e^ wt ~ kx ^ and let’s guess that 
f(z ) is some kind of exponential, so substituting 4> = ce az + l (ut-kx) j n p 0 i ss0n ’ s equation gives 
V 2 'F = (a 2 — fc 2 )'F = 0 and therefore a = ±fc. Since z is negative below the surface, for the infinitely 
deep bath (which is typically the most interesting case) we need to take the solution a = —fc, so the 
velocity field disturbance falls off exponentially with depth with e-folding scale fc -1 = A/27r. This 
justifies the assumption in the order of magnitude analysis that the velocity field extends a distance 
~ A below the surface. The boundary condition equation now becomes gfc’F = — oAT which gives 
the dispersion relation 

Lo(k) = sj,gk (20.42) 

again in accord with the hand-waving analysis. 

For a bath of finite depth one needs to provide also a boundary condition that u z = d^/dz 
vanish on the floor of the bath, and one finds then that the solution is a combination of the growing 
and decaying exponential solutions. 

The dispersive nature of these waves is quite interesting. The group velocity is dui/dk oc fc -1 / 2 
so the group velocity scales as the square root of the wavelength. Thus a localized disturbance (an 
impulse) will evolve into a ‘chirp’ with the low-frequencies arriving first. 

Note that u p hase = \Jg/k whereas u gl . 0 up = \ y/g/k which means that wave crests travel twice as 
fast as the group velocity. For a packet of waves this means that wave crests will appear as if from 
nowhere at the tail of the packet, march forward through the packet, and disappear to the front. 
All of this is familiar to anyone who has idly tossed pebbles into a pond. 







258 


CHAPTER 20. IDEAL FLUIDS 


20.9 Sound Waves 


To obtain the wave equation for small amplitude acoustic oscillations we start with the equations of 
continuity and the Euler equation 


u • Vp) = —pV • u 
| + (u.V)u) = -VP/p 


dp , 
dt "T" 

3u 


We then set 


P = Po + Pi 

P = Po + Pi 

u = 0 + Ui 


(20.43) 


(20.44) 


where a subscript 0 denotes the equilibrium solution in the absence of waves (so Uo = 0 of course) 
and a subscript 1 denotes an assumed small perturbation about the equilibrium. 


Substituting (20.441 in (20.431 and keeping only terms of first order gives 


T^ + PoV-m =0 




now if the oscillations are adiabatic then 


Pi = 


(dP 


\d P 


Pi 


(20.45) 


(20.46) 


and taking the time derivative of the first of (20.451 and subtracting po times the gradient of the 
second gives 


d 2 p 


op 


dt 2 V dp. V " P1 0 

which is a wave equation in the scalar quantity pi with sound speed c s given by 

' dP s 

dp. 


c 2 = 


(20.47) 


(20.48) 


For the special case of a planar disturbance d/d y = d/d z = 0 we have 

d 2 pi 2 d 2 Pi 


dt 2 


- c„ 


dx 2 


= 0 


with general solution 


pi(x,t) = fi(x - ct) + f 2 (x + ct) 


(20.49) 


(20.50) 


where /i and f 2 are two arbitrary functions which must be determined from the boundary conditions. 
In general the solutions of the 3-dimensional wave equation are superpositions of traveling waves 


p 1 (r,f) = ^A k e i ^- kr ) 

k 

The dispersion relation for these waves is 

u 2 — k 2 /c 2 = 0 —> 


u = k/c s 


(20.51) 


(20.52) 


so these waves are non-dispersive. 

We can see under what conditions we are allowed to neglect terms of 2nd order (and higher) in 
the wave amplitude. For example, in passing from (20.431 to (20.47 ) we have, for example, dropped 
(u • V)u as compared to du/dt but for a plane wave (u ■ V)u ~ ku 2 while du/dt ~ uni so this 
is justified provided ku <C u> or, since u> = kc s if u <C c s so the linearized wave equation is valid 
provided the velocity associated with the waves is small compared to the sound speed. It is easy 
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to see from the first of (20.45) that pi/po ~ ku/cu ~ u/c s so the linearized wave equation is valid 
provided the fractional amplitud e of th e density fluctuation is small: pi/po 1 also. 

The sound speed is given by (20.481. For adiabatic compression or rarefaction we found e oc p 2 / 3 
and P oc pe oc p 5 / 3 so ( dP/dp) s = (5/3)(Po/po). Now P 0 = po(w 2 )/3 and therefore we have 


(20.53) 

ie the sound speed is on the order of the rms random thermal velocity. Note that the sound speed 
is independent of the density and is proportional to the square root of the temperature. 



20.10 Problems 


20.10.1 Ideal fluids 


Write down the continuity and energy equations for an ideal gas in the form 

Gl + u-v)p=... 

(i+u-v) £ =... 


(20.54) 


where e is the specific thermal energy density. 

Combine these equations to show that pe ~ 3 / 2 is constant along fluid trajectories, so e oc p 2 / 3 , 
and that the specific entropy s is therefore constant. 

Estimate the rate (in degrees per km) at which dry air cools adiabatically if raised in height from 
sea level. 


20.10.2 Potential flows 

Potential flows have vanishing vorticity: V x u = 0. A sufficient condition for this is that the velocity 
be the gradient of some scalar potential field </>(r). By considering the fourier transform of a general 
3-dimensional vector field (or otherwise) show that this is also a necessary condition. 

Show that it is possible to reconstruct the full 3-dimensional velocity field for a potential flow 
from measurements only of the line-of-sight component of the velocity. This technique is used in 
studies of cosmological ‘bulk-flows’ caused by nearby superclusters. 

20.10.3 Hydrostatic equilibrium 

a) Write down the equations of hydrostatic equilibrium for gas of density p and pressure P. Show 
that if P = ap 2 these can be combined to yield 

V 2 p+—p = 0 (20.55) 

a 

b) Show that this equation admits spherically symmetric solutions of the form p(r) = A sin kr/r 
if r < 7r/fc, p = 0 otherwise, where k = yj2i:G/a and A is an arbitrary constant (which is fixed once 
the total mass is specified). 

c) Show that there are also solutions of the form p(r) = B cos kr/r, but that these require an 
addition point mass at the origin. 

d) Interestingly, the above equation also admits non-spherical solutions: 

p(r) = Acos(k x x) cos(kyy) cos(k-z) (20.56) 

for |x| < 7 t/2 k x etc. Find the condition that must be satisfied by the coefficients k x ,k y ,k z and 
sketch some isodensity contours in the plane z = 0 and for k x = k y = 2/n. Does this solution seem 
physically reasonable to you? Discuss what may be wrong with this type of solution. 

e) Returning to the spherical case, consider the run of entropy with radius for the p(r) = 
A sin kr/r solution. Does it increase or decrease with r? Is the solution convectively stable? 
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Chapter 21 

Viscous Fluids 


21.1 Transport Coefficients 


The ideal fluid equations (|20.1 1 were obtained from the general fluid equations (19.57) by dropping 


the terms involving the viscous shear tensor 7 ly,- and the heat flux F. These can be calculated from 
kinetic theory, in the limit that the mean free path is small compared to the scales over which 
macroscopic conditions vary. To lowest order the solution of the Boltzmann transport equation 
Lf = (df/dt) coil is a shifted Maxwellian 


/(r,v) = 


n(r) 


(27rfcT(r)/m) 3 / 2 


o _l m | v _ u(r )|2 /fcT(r) 


( 21 . 1 ) 


However, this cannot be an exact solution of the BTE since the collision term vanishes identically, 
but the Liouville operator contains, for example, the spatial gradient term v • V x / which is non zero, 
so one needs to augment the locally Maxwellian zeroth order approximation with correction terms. 
The detailed calculation of the transport coefficients is known as the Chapman-Enskog procedure 
which is described in Huang’s book for example. The key results are that the viscous shear tensor 
is given by 

7 Tij = pDij ( 21 - 2 ) 


where 


D, 


_ du i . du :i 2 

dx-j dxi 3 


u)V 


(21.3) 


where D tl is the deformation rate tensor which is simply a symmetrized and traceless version of the 
shear tensor dui / dxj. This says that the viscous shear tensor (which is essentially the traceless part 
of the velocity dispersion tensor) is driven by the velocity shear. The coefficient of shear viscosity, 
or more briefly the viscosity , p in (21.21 is given by 


h = 


5 ( nmkT ) 1 / 2 


(21.4) 


where cr is the velocity averaged cross-section. To order of magnitude p 
\JkTjvn is the typical thermal random velocity. 

The heat flux is given by 

F = -kVT 


mvT/cr where vt 


(21.5) 


where k/ p = (5/2 )Cy is known as Euken’s constant. 

These results are well confirmed predictions of kinetic theory. The derivation is mathematically 
involved, but the general form of the results can be well understood from simple arguments. Let 
us model a shearing fluid as a set of slabs of thickness on the order of the mean free path A, as 
illustrated in figure 21.1 Each mean free time (or collision time) t = X/vt most of the particles in 


a slab will be replaced by particles from the neighboring slabs. Consider the situation where there 
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u(zq + 2A) 

► u(z 0 + X) 
- u(z 0 ) 

► u(z 0 - A) 
u(zq — 2A) 


Figure 21.1: The effect of shear viscosity on irregularities in the velocity field can be under¬ 
stood crudely as a smoothing effect due to random motion of atoms. Here we have a stack of 
slabs each of which is roughly one mean free path thick. In one collision time r ~ X/vt, a 
good fraction of the particles in the center slab will have departed, to be replaced by the parti¬ 
cles which were in the neighboring slabs. The new velocity of the center slab is approximatedly 
the average velocity of the two neighbors This means that the acceleration of the center slab is 
( du/dt) visc ~ X 2 d 2 u/dz 2 / (X/vt) ~ A VTd 2 u/dz 2 . Another way to look at this is that there is a 
flux of ^-component of momentum in the ^-direction AP/AAAt ~ (tout/ cr)du/dz. For the slabs 
adjacent to the center slab, for instance, there is a gradient of velocity, but no acceleration since the 
same amount of momentum flows through each face. For the center slab, however, there is a loss of 
^-momentum through both faces, so the slab decelerates. 


is a continuous variation of the ^-component of the flow velocity u(z). If a particular slab at Zq is 
moving faster than the average of its two neighbors then, after one collision time, this slab will have 
velocity u' = u(zq, t + r) ~ (u(zq + A, t) + u(zq — A, t))/ 2 and the acceleration of this slab is therefore 


du Au u(zq, t + t) — u(zq, t) u(zq + A, t) — 2u(zq, t) + u(zq — A,f) 

dt At t 2 r 


( 21 . 6 ) 


but if the velocity field u(z) varies smoothly the numerator in the final expression is approximately 
A 2 d 2 u/dz 2 , and since r ~ A /vt, the viscous acceleration is 


du\ 2 

at 1 ~ wAV “■ 


(21.7) 


On the other hand, in the fluid equations (19.571 this term is 


du 




— ) = p V-7T--V «. 


( 21 . 8 ) 


since 7r ~ /iVu. These are compatible if /r ~ pvx X ~ ( mn)(kT/m) 1 / 2 (ncr)~ 1 ~ y/mkT/a in 
agreement with (21.4). 

The exchange of particles between slabs gives a momentum flux (in this case a flux of x- 
momentum in the direction in which u x is decreasing). The exchange momentum between two 
slabs of area A is AP ~ pAXAu ~ mnAX 2 du/dz which is exchanged in time At = X/vt and the 
momentum transport per unit time per unit area (or the force per unit area) is 


AP tout du 
AAt a dz 


(21.9) 


so the momentum flux is proportional to the shear. If there is just a pure gradient of velocity then 
the momentum flux is constant, and the momentum of any slab does not change with time. If there 
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is a second derivative of the flow velocity then the force per unit area will be different on the top 


and bottom of the slab and there will be an acceleration of the slab (see figure 21.11. 

It is interesting that the density of particles does not appear in the viscosity or in the momentum 
flux. While the number of particles per second crossing a surface increases with n, the distance they 
travel decreases since A ~ l/(ncr) and these effects cancel. 

The effect of viscosity in the Euler equation is to gradually smooth out the velocity flow. If 
u(z) is initially fluctuating then the faster slabs will decelerate and vice versa until the flow becomes 
uniform with u —> u the mean of the initial velocity. Since the bulk kinetic energy density is pit 2 , this 
smoothing of the velocity field results in a net loss (or dissipation) of the kinetic energy. This energy 


ends up in heat, and appears in the internal energy equation (19.541 as the ‘viscous dissipation’ term 
T ~ 7rVu. 

The form of the thermal conductivity can be understood from the same kind of argument. If 
there is a gradient of temperature then the difference in energy per particle over a distance A is 
A Ei ~ fcAVT and since the number of particles in a slab of volume AX is N ~ nAX and these 
exchange in a time r = X/vt, the flux of energy per unit area is 


p ^ NAEi ^ fcuT VT 
At a 


( 21 . 10 ) 


again in accord with (21.51. 


21.2 Damping of Sound Waves 


Keeping the viscous term in the Euler equation and the dissipation term in the internal energy 
equation and linearizing as before gives 


= -^VFi + aAu T V 2 u 
^ = -%V-u + pXv T V>e 1 


( 21 . 11 ) 


with a, (3 dimensionless factors of order unity. Strictly speaking, the dissipation term here will cause 
the gas to heat up, and the background solution eg will not be exactly constant, as we have assumed. 
However, the energy in the waves is second order in the wave amplitude and we can ignore this and 
take the gas to be effectively isentropic, so VPi = (5/2)(P 0 /eo) Vei = (5/3)poVei. 

If we try solutions 


u = k[/e*(“ t_k ' x ) 
d = Ee^*-^ 


( 21 . 12 ) 


and the differential equations (21.111 become the algebraic equations 


iuiU — | ikE + vk 2 U = 0 
icoE — \ikc 2 U + @-vk 2 E = 0 

5 s ot 


where we have defined the kinematic viscosity v = olXvt- 

The first of these equations gives U = | ikE/(iu> + vk 2 ) and using this in the second equation 
and dividing through by E gives the dispersion relation 

(iu> + vk 2 )(iu + Ts^k 2 ) = — c 2 k 2 (21.14) 

P 


and if we assume that the damping is weak (so the fractional change in the amplitude of the wave 
per cycle is small) then the frequency is 


u = c s k + iT (21.15) 

where the damping rate is T ~ vk 2 . 

• This damping rate is proportional to k 2 so short waves damp out fastest. 

• The damping time is fdamp ~ L 2 /Xvt ~ (L/A) 2 f co ii (with A the mean free path and L the 
wavelength), but this is just the time it takes for a particle to random walk a distance L. 
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21.3 Reynold’s Number 


The Euler equation for a potential flow and neglecting gravity is 

O-l -1 

- 7 " + iv « 2 = --VP + i>V 2 u. (21.16) 

at 2 p 

Ignoring the last term on each side gave non-dissipating sound waves. Including viscosity, as we 
have seen, causes the sound waves to decay. The second term on the left hand side, which we shall 
call the ‘inertial term’, in contrast, leads to instability in the sense that irregularities will tend to 
grow. One way to see this is consider a single planar sound wave with spatial frequency k(,. The 
term Vit 2 /2 therefore contains a component at spatial frequency 2ko. This term therefore appears 
as a driving term for the oscillation mode k = 2ko. The inertial term therefore couples modes of 
different spatial frequencies. In the absence of viscosity this leads to the ‘turbulent cascade’ with 
energy propagating from large scales to smaller scales. 

This natural tendency to instability for non-viscous fluids can be stabilized is stabilized if the 
viscosity is sufficiently large. If we have a velocity disturbance of amplitude u on scale L then the 
relative importance of the viscous and inertial terms is given by the dimensionless ratio 


|Vu | 2 _ u 2 /L _ uL 
vS7 2 il vu/L 2 v 


(21.17) 


which is called the Reynold’s number . 


• If the Reynold’s number for a system is large compared to unity the viscous effects are relatively 
unimportant and vice versa. 

• Since v ~ vt A the Reynold’s number can be written as Re ~ (u/c s )(L/ A). Now for a fluid 
description to be at all valid we need L^> A, so the Reynold’s number will typically be large 
(unless the velocity associated with the wave is tiny compared to the sound speed). 

• If two systems have the same Reynold’s number the solutions are mathematically similar, 
which means that the solutions for one can be obtained from those for another be applying a 
suitable scaling of lengths and times. 


21.4 Problems 

21.4.1 Viscous hydrodynamics 

Water has a kinematic viscosity v ~ 10 _2 cm 2 /sec. Infer the mean free path for water molecules, and 
estimate the damping time for gravity waves on the ocean as a function of their wavelength. Review 
the derivation of the dispersion relation for deep ocean ‘gravity waves’: u>(k) = \/kg. Estimate the 
shortest wavelength waves which can reach Hawaii effectively unattenuated from a distant storm in 
the North Pacific. 


21.4.2 Damping of Sound Waves 

Use the ideal fluid continuity and energy equations to derive the linearised wave equation for sound 
waves. Show that the most general solution for planar sound wave disturbances is fi(x—ct)+f 2 (x+ct) 
where /i and / 2 are arbitrary functions. Air has a kinematic viscosity v ~ 0.15cm 2 /sec. Using 
random walk arguments, or otherwise, estimate the damping time for a sound wave at ‘middle-C’ 
(256Hz). 
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21.4.3 Sound waves 

The continuity, Euler and energy equations for an ideal gas can be written as 

(m + u - v)p=-pV-u 
(| + u.V)u=-VP/p-V^ 

(m + u-v) £ = — fv.u 


(21.18) 


a) What do the symbols u, p, P, <f>, e represent? 

b) Assuming a simple monatomic gas, combine the continuity and energy equations to show that 
pe r -3 / 2 is constant along fluid trajectories, so e oc p 2 / 3 , and that the specific entropy s is therefore 
constant. 

c) Let p(r,t) = po + pi(r,i), P(r,i) = Po + Pi(r,t) where pi,Pi are understood to be small 
perturbations about a static uniform solution. Neglecting, for the moment, the effect of gravity, 
linearise the continuity and Euler equations and combine these to obtain the wave equation for 
adiabatic sound waves 


d 2 Pl 
dt 2 


dP 

v>1=0 


(21.19) 


d) Find the relation between the sound speed and the rms thermal velocity of atoms for a simple 
monatomic gas. How does the sound speed depend on density? How does the sound speed depend 
on temperature? 

e) For finite mean free path, diffusion effects will modify the fluid equations. Indicate the general 
form of the extra terms in the fluid equations at the next level of approximation (Navier-Stokes) 
and describe their effect on sound waves. 

f) Discuss qualitatively the effect of including the self-gravity of the sound wave. You should 
explain what is meant by the ‘Jeans length’, and indicate how it is related to the sound speed and 
the dynamical time tdyn ~ ( Gp ) -1 / 2 . 
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Chapter 22 

Fluid Instabilities 


Fluids are susceptible to various instabilities. We have already discussed convective instability. We 
first briefly mention the Rayleigh-Taylor and Kelvin-Helmholtz instabilities, and then discuss in a 
little more depth gravitational instability, thermal instability and turbulence. 


22.1 Rayleigh-Taylor and Kelvin-Helmholtz 

Rayleigh-Taylor instability arises when a heavy fluid lies over a lighter fluid; any small irregularity 
in the height of the interface will grow exponentially and rapidly become large. 

The term is also used to describe what happens in an explosion with dense ejected material 
incident on lighter surrounding material. 

In both situations, ripples in the interface grow into ‘fingers’ of dense matter penetrating the 
lighter material. 

Kelvin-Helmholtz instability arises, for instance, when wind blows across the surface of a fluid, 
or more generally when there is wind shear in a stratified medium. In the former case the instability 
can be stabilized by surface tension of the interface. For a shearing stratified fluid the instability can 
be stabilized by a sufficiently large entropy gradient, much as we found for convective instability. 
We will discuss this later when we consider turbulence, the onset of which is closely related to 
Kelvin-Helmholtz instability. 


22.2 Gravitational Instability 

Gravitational instability appears if we add the gravitational term to the linearized sound wave 
equations. Starting with the continuity and Euler equations 


Pi = -PqV • u 
u= -c 2 ^ - V$ 

s Po 


( 22 . 1 ) 


with V 2< f> = 47rGpi the peculiar gravity caused by the density fluctuation. Taking the time derivative 
of the first and using the second to eliminate the velocity gives 


Pi = c 2 s V 2 pi + AnGpoPi 

and using trial solution pi oc e *M- k ' x ) yields the dispersion relation 

io 2 = k 2 c 2 — AnGpo- 


( 22 . 2 ) 

(22.3) 


• The last term is the inverse square of the dynamical-time or collapse-time for a body of density 
Po- 
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• The squared frequency becomes negative (signalling exponentially growing instability) for 
waves with wavelength exceeding a critical value 

A c = 2ir/k c = 2ttc s / ^/4nGpo (22.4) 

• Crudely speaking, the criterion for stability of an over-dense region is that a sound wave should 
be able to cross the perturbation in less than the dynamical time. 

• We have performed the ‘Jeans swindle’ in side-stepping the issue of the fact that the assumed 
stable background about which we are making a linearized perturbation is in fact unstable. We 
will see how this can be justified when we consider growth of density fluctuations in cosmology. 


22.3 Thermal Instability 


Consider a box with transparent walls full of atomic or molecular gas with uniform density p and 
temperature T. The atoms will have a Maxwellian velocity distribution and the atoms will have a 
thermal distribution of internal excitation levels. In the absence of any external radiation field the 
gas can cool through collisionally induced excitation followed by spontaneous emission, and there will 
be an emissivity which is a function of the temperature and the density of the gas: j = j(T, p). If we 
switch on an external radiation source, then one should be able to find combinations of temperature 
and pressure such that the radiation loss is just canceled by absorption, and the system will then be 
in equilibrium. However, for most temperatures, this equilibrium will be unstable. This is because 
the energy loss due to collisions scales as the density squared, whereas the energy gain rate is 
proportional to the density times the absorption cross-section. 

Imagine one has gas with thermodynamic parameters (pq,Tq) which is just in balance between 
heating and cooling. If one were to perturb a parcel of this gas by injecting a little heat then it will 
expand and move to a new point with a lower density. It must, however, have a higher temperature 
than it had initially, since it must remain in pressure equilibrium. Similarly, if we remove a little 
heat it must contract slightly, and become denser (and therefore cooler if it is to remain in pressure 
equilibrium). This is illustrated in figure 22.1 If the cooling rate is insensitive to the temperature, 
then an over-dense, and therefore cooler, perturbation will radiate more effectively and will become 
still denser and cooler. An under-dense, and therefore hotter, region will radiate less efficiently, 
so the radiative energy input will exceed the rate at which energy is radiated and the region will 
become even more under-dense and hotter. The initial density inhomogeneity seed, no matter how 
small, will grow exponentially and the system is said to be ‘thermally unstable’. 

The only way to avoid this instability is if the cooling rate is strongly dependent on the gas 
temperature, with the cooling being more effective at higher temperature. If the temperature de¬ 
pendence is sufficiently strong, then the cooling rate under isobaric conditions may decrease with 
increasing density, and the gas can then be stable. 

This strong, positive dependence of cooling rate on temperature can occur for certain, rather 
specific, temperatures. For example, for atomic gas at temperatures below about 10 4 K the proba¬ 
bility that an atom is excited is exponentially small; thus a small increase in temperature results in 
an exponential increase in the fraction of excited atoms, and consequently in the radiative cooling 
rate. A similar transition occurs around 10 2 K for molecules where molecular vibration states get 
excited. 

A body of gas at an intermediate temperature, say around 10 3 K, will be unstable, and will 
precipitate over-dense clouds which will end up at around 100K surrounded by a hotter gas at 
around 10 4 K. This leads to the ‘two-phase medium’ picture (or more generally multi-phase medium) 
of the interstellar medium. 

The condition for stability is ( OC/dT)p > 0, where C is the cooling rate. 
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Figure 22.1: The curve shows an isobar passing through the state (po. To), for which the collisionally 
induced cooling just balances the energy input radiation. If we perturb an element of the gas by 
adding or removing some heat then it will move along this isobar. Since cooling ordinarily tends 
to be more efficient for higher densities, this is unstable (see text). To avoid the instability it is 
necessary to have the cooling rate increase strongly with temperature. 

22.4 Turbulence 

Empirically, flows with small Reynold’s number are stable, whereas high Re flows are unstable and 
lead to chaotic behavior. Examples are stirring cups of honey and coffee respectively. The latter type 
of flow tends to form eddies, which form sub-eddies and so on, and there is said to be a turbulent 
cascade of energy from some outer scale — the scale on which bulk kinetic energy is being injected 
down to some inner scale at which viscosity becomes important and the energy is converted to 
heat. 

Turbulent flows are highly non-linear and complicated and are a challenge even to numerical 
calculations. There are however some simple scaling properties of turbulent flows that are expected 
to hold if the inner scale is much less than the outer scale. These scaling laws are known as the 
Kolmogorov spectrum. The range of scales Ti nner <, L L 0uter is known as the inertial range. 


22.4.1 Kolmogorov Spectrum 

Assuming the Mach number to be small, the flow can be modeled as approximately incompressible. 
At length scale L, there will be eddies with some characteristic velocity v(L) and the kinetic energy 
density in these eddies is e ~ pv 2 (L). These eddies turn over on time-scale t(L) ~ L/v(L) and it is 
reasonable to assume that these eddies transfer their energy to smaller scale eddies on the order of 
the turnover time. These eddies will also be receiving energy from larger scale ‘parent’ eddies. In 
the steady state there can be no build up of energy at any particular scale, so we require that the 
energy density transfer rate de/dt ~ e/r be independent of L. This gives pv 2 (L) oc L/v(L), or 

v(L) oc T 1/3 (22.5) 

this is the Kolmogorov velocity spectrum for fully developed turbulence. 
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22.4.2 Passive Additives 


Similar arguments can be applied to the spatial power spectrum of passive additives. A passive 
additive is a property which is conserved along the flow. Examples are the entropy of a gas, water 
vapor content of air, ‘creaminess’ of liquid in a just stirred cup of coffee. For the last example, defining 
the creaminess c(r) to be unity in the cream and zero otherwise then after we pour the cream, the 
power spectrum P c {k) will be dominated by low frequencies, but the turbulent cascade will cause 
power to be created at higher frequencies. Now the variance of the additive (c 2 ) is conserved (until 
we get down to scales where diffusion becomes effective) and so again, in the steady state, in which 
there is some agency injecting kinetic energy and creaminess fluctuations on a large scale L 0 uter and 
these are cascading to smaller scales on a time-scale of order the eddy turnover time. The rate at 
which variance is being transferred (c 2 )l/t should be independent of scale. This leads to 


(c 2 ) l oc t(L) 


L 

<L) 


oc L 2 ' 3 . 


( 22 . 6 ) 


Since 
P c (k) oc k 


c 2 )l 


with 3 


(k 3 P c (k)) k ^i/ L the scaling ( 22.61 corresponds to a power-law power spectrum 
+ n = —2/3 or 

P c (k) oc k~ 11/3 . (22.7) 


For this type of power-law spectrum, the auto-correlation function is not well-defined (in reality, 
the value £ c (r) will be determined by the outer-scale) but the structure function is well defined and 
has the power law form 


S'c(r) = ((c(0) - c(r)) 2 ) oc r 2/3 . 


( 22 . 8 ) 


22.4.3 Inner Scale 


The turbulent cascade persists down to the diffusion scale at which viscosity acts to damp out the 
flows and irreversibly mix any passive additives (thus destroying additive variance). 

The diffusion scale is such that the damping rate u~ l du/dt ~ v/L 2 ~ 1 /t(L) ~ v(L)/L with 
v ~ vt A the kinematic viscosity. But with with v{L) = w(L out er)(T/iouter) 1 ^ 3 this gives 


( _"_ 

V -^outer v(L outer) 


3/4 


(22.9) 


22.4.4 Atmospheric Seeing 

An interesting application of turbulence theory is provided by atmospheric seeing which arises from 
turbulence driven by large scale wind shear mixing air with inhomogeneous entropy, water vapor 
etc. Since atmospheric turbulence is strongly sub-sonic we can assume that the air is in pressure 
equilibrium, so fluctuations in the entropy are reflected in fluctuations in the density and refractive 
index Sn. 

These refractive index fluctuations cause corrugation of the wavefronts from distant sources. 
The vertical deviation of the wavefront (or effectively the phase fluctuation) is proportional to the 
integral of the air density, and has the same spectral index P k oc k ~ n / 3 and the two-dimensional 
structure function is S k oc r 5 / 3 . A realization of a wavefront after propagation through a turbulent 
atmosphere is shown in figure |22.2| 

The structure function for the phase fluctuations is also a power law, and is conventionally 
parameterized in terms of the Fried length ro as 

S v (r) = 6.88 (r/r 0 ) 5/3 . (22.10) 

The Fried length is therefore the scale over which the rms phase difference is \/6.88, so different 
parts of the wave-front separated by ro or more will be significantly out of phase with each other. 

We saw earlier that the optical transfer function of the telescope (the Fourier transform of 
the point spread function) is proportional to e~ s vO/ 2 with S v the structure function for phase 
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Figure 22.2: Realization of wavefront from a distant source after propagating through the atmo¬ 
sphere. The projected phase fluctuations were modeled as a Gaussian random field. 


fluctuations. Since S v oc r 5 / 3 , taking the transform leads to a PSF g{9) with a core of radius 
9 C ~ A/ro and with a power-law halo or ‘aureole’ g(9) oc 0 -11//3 . for 0 9 C . 

This is for an ideal imaging system. In real telescopes there are usually sharp edges to the pupil or 
input stop, and this results in wings of the PSF with g oc r -3 . There is also usually mirror roughness 
which typically exceeds the atmospheric corrugation on small scales <C ro and typically generates a 
PSF with a g oc r~ 2 . This indicates that the spectrum of mirror errors has index n ~ —2, which is 
2-dimensional flicker noise. Putting these affects together suggests that a real PSF will consist of a 
core, surrounded by a series of power law extensions of progressively shallower slopes. 


22.4.5 Stability 

In the atmosphere the Reynold’s number for shearing flows on scales of order km is enormous, so one 
might expect turbulence to be ubiquitous. However, it is possible for the atmosphere to be stabilized 
against turbulence by a positive entropy gradient (entropy increasing with altitude) since a positive 
entropy gradient means it costs energy to create a large scale eddy. Comparing the kinetic energy 
in velocity shear in a region of size L to the potential energy cost for turning over a cell of this size 
leads to Richardson’s criterion for stability. 

The kinetic energy is straightforward. For a cell of size L it is I?kin ~ pL 3 (LV u) 2 ~ pu 2 L 5 . 

To compute the energy cost, recall that the specific entropy is 

s= — f-lnT-lnnV (22.11) 

m \ 2 J 


where to is the mass of the air molecules. Assuming adiabaticity, the density and temperature are 
related by 


T 3/2 

exp (ms/k) 


( 22 . 12 ) 


Let’s say we take some parcel of gas from altitude z' at which the specific entropy is s' and raise it 
to level 2 where the entropy is s (and the density and temperature are n and T). If the density and 
temperature of the displaced gas are n' and T' then these satisfy 


rji/3/2 


n 


exp (ms'/k) 


(22.13) 
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but they must also satisfy pressure equilibrium: 


n T = nT 


so, using this to eliminate the temperature T' in (22.131 gives 


(n') 5/2 = 


n 3/2 T 3/2 


or, using (22.121 


exp (ms'/k) 
n'\ 5 ^ 2 exp (ms/fc) 


exp (m s'/k) 


= exp(m(s — s')/k ) 


(22.14) 


(22.15) 


(22.16) 


With n' = n + An and s' = s — As, and assuming small fractional changes in the density, entropy 
etc 

/\ u 

1+--~l + mAs/fc (22.17) 


or 


2 n 

— = -mAs/k 
n 5 


(22.18) 


If we overturn a parcel of fluid of size L through its own length, the fractional density change is 
therefore A p/p ~ mLVs/k, and the potential energy cost is -E pot ~ g(ApL 3 )L ~ gpmL 5 X7s so the 
ratio is 


E, 


pot 


grriS7 s 


E kin fc(Vu) 2 ’ 

An atmosphere will be stable if this dimensionless number exceeds unity. 


(22.19) 


22.5 Problems 

22.5.1 Kolmogorov turbulence 

Give an order of magnitude estimate for the specific energy ‘flux’ (from parent to daughter eddies) 
in a turbulent cascade in terms of the characteristic velocity Ul of eddies of size ~ L. Thereby 
obtain the Kolmogorov scaling law for fully developed turbulence in the form Ul oc L 7 , and the 
corresponding law for the turnover time t(L). 

If the mean free path in air is / = 4 x 10 _6 cm and the mean thermal velocity is jit = 4x 10 4 cm/s, 
estimate the time for atoms to diffuse a distance L. 

By combining the above results, estimate the ‘inner scale’ at which the bulk kinetic energy is 
dissipated by viscosity into heat, if the ‘outer scale’ for atmospheric turbulence is ~ 100m and the 
velocity at this scale is ~ 20m/s, 








Chapter 23 

Supersonic Flows and Shocks 


Subsonic flows with u -C c s are nearly incompressible while for transsonic (u ~ c s ) and supersonic 
(u> c s ) flows compression of the gas is important. 

One can see this from Bernoulli’s equation u 2 /2 + h = constant, with dh = dP/p as we will 
assume adiabaticity. Writing dP = ( dP/dp) s dp = c^dp and du 2 /2 = udu gives 





2 du 2 dp 

u -he, — = 0. 

u s p 


(23.1) 

or 



Ap ,,, Au 
— = -M 2 — 
p u 


(23.2) 

where the Mach number 

is defined by 







M = u/c s . 


(23.3) 

It follows that 

Ap 

P 

f « 
L » 

) A “ for 

r m «i 

X M»1 

(23.4) 


23.1 The de Laval Nozzle 


An interesting application is the de Laval nozzle in which gas is accelerated to supersonic velocities. 
This is the basis of rocketry and the physics is relevant for astrophysical jets. 

In the de Laval nozzle gas passes adiabatically through a constricting venturi. The continuity 
equation is pAu = constant, or dp/p + du/u + dA/A = 0. Using (23.2) this says 


. du 


dA 


f 1 - M A = - A 


(23.5) 


For low Mach number M < 1, a constriction dA < 0 causes an acceleration du > 0 so subsonic gas 
entering a venturi will accelerate. For M > 1 though a constriction requires a deceleration. Thus, 
if one can arrange that the velocity just equals the sound speed in the waist of the venturi, the gas 
can be accelerated both entering and leaving the venturi. 

This is an effective way to convert thermal energy in the gas into bulk kinetic energy. 


23.2 Shock Waves 

If a subsonic flow passes an obstacle, sound waves can propagate upstream and ‘warn’ the incoming 
gas, so the flow can adjust itself to accommodate the obstacle. For a supersonic flow this is im¬ 
possible and obstacles lead inevitably to shock waves. Gas encountering a shock suffers a sudden 
and irreversible increase in entropy as bulk kinetic energy is converted to heat. The microscopic 
behavior of the shock is complicated, but important features can be obtained from the principles of 
conservation of mass, momentum and energy. Here we will consider collisional shocks. 
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Figure 23.1: The upper sub-figure shows a piston begin driven along a tube at velocity r' p i s ton- 
Initially the tube contains gas with sound speed Ci < v p i s ton- A ‘shock-wave’ preceeds the piston and 
propagates into the cold gas. In the shock, collisions convert bulk kinetic energy into heat. Behind 
the shock the tube contains heated and compressed gas. The lower sub-figure shows the same thing 
from the point of view of an observer who moves along with the shock. The basic features of the 
shock - - shock velocity, density and pressure jumps etc — can be determined from continuity of flux 
of mass, momentum and energy, independent of the details of the collisions happening in the shock. 


23.2.1 The Shock Tube 


Consider a tube containing gas with density pi, pressure P\ and sound speed ci with a piston begin 
driven along the tube at velocity v p > c± as illustrated in figure 23.1 The gas particles initially 


impacted by the piston will be strongly accelerated. They will in turn collide with particles further 
along the tube, and it is not unreasonable to assume that a disturbance will proceed along the tube 
leaving an irreversibly heated body of gas in its wake. This is known as a collisional shock ; a narrow 
region within which collisions rapidly generate entropy. 

To analyze this is is convenient to shift to a moving frame of reference in which the putative 
shock is stationary, as illustrated in the lower part of figure [23T In this frame, gas arrives from one 
side at velocity u\ and with ‘pre-shock’ density p\ and pressure P\ , gets heated and accelerated and 
leaves with ‘post-shock’ velocity, density and pressure u^, P 2 and P‘ 2 - To obtain the relation between 
pre- and post-shock quantities we require that in the steady state, the flux of mass, ^-momentum 
and energy should be the same on both sides of the shock. Continuity of mass flux is trivial: we 
need p\U\ = p 2 u 2 - Continuity of momentum and energy is trickier since we need to take account of 
both bulk and micro-scopic motions. 

The rate at which particles cross a unit area which is perpendicular to the flow is dN/dtdA = 
f d 3 v f(y)v x . The mass flux is therefore 


j = m 


dN 

dtdA 


= m 


J d 3 v f(v)v x = TO J d 3 w /o(w)(u + w x ) 


(23.6) 


where /o(w) is the distribution of random velocities. From the definition u = ( v x ) the mean random 
velocity vanishes f d 3 w f{yf)w x = 0 and we have 

j = mnu — pu (23.7) 


as expected. 
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Momentum flux is 

J d 3 v f{v)v x (mv x ) = to J d 3 w / 0 (w)(u + u> x ) 2 = pu 2 + p{w 2 x ) = pu 2 + P (23.8) 

where we have assumed an isotropic distribution of random velocities so that (w 2 ) = {w 2 )/ 3 = P/p. 
Continuity of x-momentum implies continuity of pu 2 + P. 

Energy flux is 

J d 3 v f(v)v x (\mv 2 ) = |m/ d 3 w / 0 (w)(u + w x )(u 2 + 2u • w + w 2 ) 

= pu (\u 2 + \(w 2 ) + (w 2 x )) (23.9) 

= pu [\u 2 + e+ P/p) = pu (|u 2 + h) 

where h = e + P/p is the specific enthalpy. Since pu is just the mass flux (which is continuous), 
continuity of energy flux therefore implies continuity of u 2 /2 + h. 

The jump conditions are then 


PlUl = p 2 u 2 

Piu\ + Pl= P2U\ + P 2 


1„,2 


(23.10) 


uf + hi = 7>U2 + h 


These were derived here for the simple case of a monatomic gas, but they are in fact more general. 

It is interesting that h = e + P/p and not just the internal energy e appears in the energy flux 
continuity equation. This means that the sum of the bulk and thermal kinetic energies of a parcel of 
gas are not the same after the shock as before. This is because the volume of this element will have 
changed, in fact, as we will shortly see, the gas parcel will have been compressed, and this required 
work to be done which came from the gas itself. 


To solve (23.10) we proceed as follows: 


1. Let U\ = jV\ and u 2 = jW where j is the mass flux and V = 1/p is the specific volume, ie the 
volume occupied by unit mass of gas. 


2. Eliminating the velocities u±, u 2 , the second of (23.10) becomes 


Pi+j 2 V 1 = P 2 + j 2 V 2 


or 


3 = 


P 2 -P 2 

V 1 -V 2 


3. Eliminating the velocities u\, u 2 from the third of (23.101 gives 

1 


hi + = h 2 + -j 2 V 2 


or 


or, since h = e + P/p, 


hi-h 2 + -(Ei + E 2 )(P 2 - Pi) = 0 


ei-e 2 +2W-^)(P 2 + Pi) = 0. 


(23.11) 

(23.12) 

(23.13) 

(23.14) 

(23.15) 


If we specify Pi and Vi = 1/pi then either of equations (23.141 or (23.141, whichever is most 
convenient, provides a relation between P 2 and E 2 . This relation is known as the shock adiabat. 

Once we specify one post-shock quantity such as P 2 (which is determined eventually by the 
energetics of the piston or explosion which is driving the shock) then all other post-shock quantities 
are fixed. 

A shock is said to be ‘strong’ if the post shock pressure greatly exceeds the pre-shock pressure. 
In this case we set Pi = hi = = 0 and it is not difficult to show that the density increases by a 
factor 4. 

See Landau and Lifshitz “Fluid Dynamics” for discussion of the stability of collisional shocks. 
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23.2.2 Vorticity Generation 

Kelvin’s circulation theorem tells us that an ideal fluid which is initially vorticity free remains that 
way for ever. However, with dissipation this is no longer the case. In collisional shocks there is 
strong dissipation in the shock and, if the shock is oblique (ie the normal to the shock does not 
coincide with the flow velocity vector) vorticity can be generated. 

23.2.3 Taylor-Sedov Solution 

The Taylor-Sedov shock solution describes what happens if a point-like explosion occurs in a uniform 
density gas. After some short time the explosion will have ‘swept up’ more than the mass of the 
material ejected in the explosion, and subsequently one expects to find a spherical shock wave 
propagating radially outward, with radius as a function of time r = r(E, p 1 1 ) where E is the energy 
of the explosion and p is the density of the ambient gas. 

By dimensional analysis, if we assume that the solution has power law dependence on E, p and 
t, ie r = E l p m t n then equating powers of mass, length and time on both sides of this equation gives 
the power law indices l = 1/5, to = —1/5 and n = 2/5 so the form of the solution is 

r(t) = aK^V 175 * 275 - (23.16) 

One could also have reached this conclusion from energy considerations. Note that the velocity 
of the shock is v = (2/5 )r/t and the kinetic energy is E ki n ~ pr 3 v 2 which is independent of r and t 
and is on the order of the initial energy injected. 

The Taylor-Sedov solution applies to supernovae explosions. At late times cooling can become 
effective and the shock then evolves into the ‘snow-plow’ phase. 

23.3 Problems 

23.3.1 Collisional shocks 

a) Write down or, if you like, derive the equations expressing the continuity of fluxes of mass, 
momentum and energy for a planar collisional shock. Work in frame of reference such that the shock 
is stationary and denote the pre- and post-shock velocities by u\ , U 2 respectively and assume that 
the flow velocity is perpendicular to the shock surface. 

These equations relate the densities p, pressures P and enthalpies h = e + P/p on the two sides 
of the shock discontinuity. 

b) Take the limit of the continuity equations for a strong shock (Pi <C P 2 ) and show that for a 
monatomic gas the density jumps by a factor 4. 

c) Find the relation between the shock propagation velocity (i.e. the velocity at which the shock 
moves into the unshocked medium) and the post-shock sound speed. 

d) A strong spherically symmetric shock resulting from an explosion of energy E is propagating 
into uniform cold gas of density p. In the absence of radiative cooling the shock radius r(t; E, p) can 
only be a power-law function of its arguments: 

r = aE l t m p n (23.17) 

where a is some dimensionless constant which we will assume to be of order unity. Use dimensional 
analysis to find the indices l,m,n. 

e) A supernova explosion dumps energy ~ 10 51 erg in a region containing uniform cool gas at 
density n = 1cm -3 . Use the ‘Taylor-Sedov’ scaling law you have just derived to compute the radius 
and velocity of the shock front after 100 years. 



Chapter 24 

Plasma 


24.1 Time and Length Scales 

Much of the baryonic material in the Universe is in the form of plasma; highly ionized, and therefore 
electrically conductive, gas. There are a variety of important length- and time-scales for plasmas: 


24.1.1 Plasma Frequency 


Consider an otherwise electrically neutral plasma where we take a block of electrons and displace 
them sideways by an amount x as illustrated in figure |24.1 This generates a restoring force so the 
equation of motion is 

(24.1) 


where the plasma frequency is 


_ / rt \ 1/2 

uj p = \/4nne 2 /m e ~ 5.6 x 10 4 (- 5 - ) s ^ 1 

Vlcm -05 / 


(24.2) 


Thus if we physically disturb a plasma it will ‘ring’ at the frequency lo p . Also, as we will shortly 
see, electromagnetic waves can only propagate through the plasma if their frequency exceeds to p . 
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Figure 24.1: If we displace a block of electrons in a plasma from their initial location — as indicated 
by the dashed box — this generates an electric field which acts to try to restore neutrality. In the 
absence of damping, this leads to oscillation at the plasma frequency 0 o p = sj 4nne 2 /m e . 
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24.1.2 Relaxation Time 


The relaxation time is the time-scale for collisions to establish a Maxwellian distribution of veloc¬ 
ities. We can crudely estimate this as the time between collisions with sufficiently small impact 
parameter that they substantially deflect an electron. For such collisions, the potential energy at 
closest approach is on the order of the typical thermal kinetic energy: 


b 


~ kT 


so the cross-section for such collisions is 


<7 


6 2 



(24.3) 


(24.4) 


The mean free path is 

1 k 2 T 2 m 2 v 4 
no ne 4 ne 4 

where vt ~ \JkT/m e is the typical thermal velocity. The relaxation time is then 


t 


l 

vt 


2 3 

m Vt 
ne 4 


rpi/1 

0.92-s 

n 


(24.5) 


(24.6) 


where n is in units of cm -3 and T is in Kelvin. 

For most astrophysical plasmas, t c is much less that the plasma oscillation frequency, but is still 
short compared to most other time-scales such as the age of the system, so we expect the electrons 
to have relaxed to a Maxwellian velocity distribution. 


24.1.3 Debye Length 


Consider what happens if we introduce a positive ion into an electrically neutral plasma. The 
electrons will respond to the electric field of the ion, and will be attracted towards it, tending to 
counteract the positive charge. They will, in general, overshoot, and there will be some oscillations 
which radiate outwards. Once these have departed, the result will be an electically neutral plasma 
with a very concentrated electron charge concentration surrounding, and just compensating, the 
electric field due to the ion. The time-scale for the plasma to adjust to the implanted ion is on the 
order of the inverse of the plasma frequency u> p . 

This is for a cold plasma. What happens if the electrons have a finite thermal velocity? Random 
thermal velocities will tend to smear out the electron charge concentration, but this will then reveal 
charge of the implanted ion, so there is a competition between these two effects. Now at a distance 
l from the ion, the time-scale for the velocities to act is r ~ 1/vt- Here Vt ~ \JkT/in is the typical 
thermal velocity. This ‘smearing time-scale’ r(Z) increases with l, so there will be some distance Id 
such that t(1) ~ l/u> p : 


vt I kT 
u) v V ne 2 


(24.7) 


This length-scale (which will be defined more precisely below) is known as the Debye length. It’s 
significance is as follows: at distances l Id the plasma can relax towards neutrality faster than 
thermal velocities can smear out the electron charge concentration, so we expect that the ionic 
charge will be effectively screened by the electrons. On scales l <C Id, on the other hand, smearing 
by random motions is effective; the electron concentration is smoothed out, and the field due to the 
ion will not be screened. 

Landau and Lifshitz give a more quantitative calculation of the screening: In equilibrium (and 
in the ensemble average state) we expect the surrounding electrons and ions to have a Boltzmann 
distribution, with the density of electrons, for example, being n e (r) = n e e e ^ r ^ kT where n e is the 
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mean density and <p(r) is the potential due to the ion and the screening cloud. Similarly, the density 
of ions will be n,(r) = riie~ Zie ^ r ^ kT . The total mean charge density profile will be 

p(r) = ZeS(r) - n e ee e<t>{r)/kT + Z t n ie - Zie * {r)/kT (24.8) 

where the potential <j>{r) is related to p(r ) by Poisson’s equation 

V 2 ^ = -47rp. (24.9) 

The problem here is to find a self-consistent solution to this pair of equations. In general this is 
difficult, but finding a solution at large radii is easier, since we then expect e(f>(r) <C kT, so we can 
expand the exponentials to obtain 


p{r) ~ ZeS(r) - c i>{r)/L% 


(24.10) 


where we have invoked overall charge neutrality 
length Ld such that 


r -2 _ 

L d — 


(n e = ZiTii ) and where we have defined the Debye 


(n e + Zim) 
kT 


(24.11) 


This is in accord with the hand-waving argument above. Poisson’s equation, with the Laplacian 
written in spherically symmetric form, is 


1 dr^t 


= L 


D 


and the boundary condition is <f> 


r 2 dr 

Zelr as r —» 0. This has solution 
Ze 


<j)(r) = —e 
r 


-r/L L 


(24.12) 


(24.13) 


• The Debye length is the screening length ; on scales r <C T_d the potential is that of the bare 
ion, while on larger scales the potential is exponentially suppressed. 

• On scales L_d one can ignore the particulate nature of the plasma. 

• The Debye length is on the order of the thermal velocity times the inverse of the plasma 
frequency. This is in accord with the idea that the plasma will tend to relax towards charge- 
neutrality on time-scale 1 /w p - resulting in a enhancement of electrons around the ion - but 
thermal motions will tend to smear this out. 

• We have assumed here that the electrons are effectively collisionless. 


24.2 Electromagnetic Waves in a Plasma 

The mobile charges in a plasma have a profound influence on the propagation of electromagnetic 
waves. Here we will consider two applications; the dispersion of waves propagating through a plasma 
and the rotation of polarization which occurs if there is a large-scale magnetic field present. Both 
of these effects can be used as diagnostics to probe the properties of the medium through which the 
waves are propagating. 

24.2.1 Dispersion in a Cold Plasma 

If we consider waves with period much less than the relaxation time we can ignore collisions. Any 
electric field E associated with the wave will drive a current in the electrons. The equation of motion 
for an electron is x = eE/ to, so the current generated by the field obeys 

dj ne 2 E 

dt m 


(24.14) 
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This is valid provided the period of the waves is much less than the relaxation time, as is usually 
the case. 

If we assume a wave-like solution in which all properties vary as e d k r ~“ t ) then the current is 


j 


me 2 _ *w 2 

-E = —^-E 

UJITI 47 TUJ 


(24.15) 


The third and fourth of Maxwell’s equations become 


ik x E = ^ 

C 

ik x B = 47rj - = -i- 


-E. 


(24.16) 


If we let the wave propagate along the 2 -axis, then on combining these equations we obtain the 
dispersion relation 

w 2 = c 2 k 2 + ujp. (24.17) 

This is the same as the dispersion relation for a Klein-Gordon field with mass m = TiLOp/c 2 . The 
presence of the modile sea of electron charge density has evidently endowed the electromagnetic 
waves with a non-zero ‘effective mass’. 


• For to < LOp the wave-number becomes imaginary, and instead of traveling waves we have an 
evanescent decaying behavior E oc e~ kr . 

• The current and electric field are 90 degrees out of phase, so (j • E) = 0. This means that no 
work is done by the wave on the plasma and so the energy is reflected. 

• For lo > uj p we have traveling wave solutions, but the waves are is dispersive (see appendix |P]) . 

• The group velocity is f gr oup = doj/dk = c 2 k/a) = c^J 1 — u>p/u 2 . This is the speed at which 
information propagates, and tends to zero as u) approaches u> p . 

• If a source (a pulsar perhaps) emits pulses, then the pulses will be dispersed and we receive a 
‘chirp’ of decreasing frequency (see problem). 

• Measurement of the time of arrival vs frequency provides the dispersion measure D = f dl n e . 


24.2.2 Faraday Rotation 

Plasmas are highly conductive, and so cannot sustain any large-scale electric fields. They are, 
however, often threaded by large scale magnetic fields. The presence of a magnetic field introduces a 
new frequency in the problem: the electrons will gyrate around the field lines at the gyro-frequency 


eB 

u G = — = 1.67 x 10' 
me 


(s- 

\ gauss / 


(24.18) 


Just as with bulk oscillation of the plasma at the plasma frequency, we might expect the internal 
oscillations associated with the magnetic field to affect the oscillations of the electromagnetic field 
and thereby further modify the dispersion relation. This is indeed the case. We shall see that the 
presence of a magnetic field causes circularly polarized waves to propagate at different velocities, 
depending on their helicity. An important consequence of this is that the plane of polarization for 
linearly polarized waves rotates as it propagates through a magnetized plasma. This phenomenon 
is known as Faraday rotation, and provides an important diagnostic for magnetic fields. 

Calculation of the dispersion relation is complicated in general since this will depend on the 
wave direction and on the polarization. Here we will consider for simplicity only waves propagating 
parallel to the field lines, which we shall take to lie along the z-axis. 

Consider a circularly polarized wave propagating along the z-axis, such that the electric field lies 
in the x — y plane and is 

cos (uit) 
sin (a it) 


E x 

E„ 


(24.19) 
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In the absence of any large-scale magnetic field, and assuming electron velocities v -C c so we can 
neglect the effect on the electron of the magnetic field of the wave, the acceleration of an electron is 

x = eE/m (24.20) 

as before. This acceleration vector is the time derivative of a velocity 


V x 

_ eEo 

sin (cot) 

. V V . 

cum 

— cos (cot) 


which in turn is the derivative of a displacement 
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eE 0 

cos(o jt) 

. y. 

co 2 m 

sin(wt) 


(24.21) 


(24.22) 


Thus each electron moves in a circular orbit of radius ro = eEo/urm at velocity Vo = eEo/com. 

Now add a static magnetic field parallel to the z-axis. There will now be an additional ev x 
B/(mc) component to the acceleration. If we guess that the velocity is 


= 


sin(wt) 
— cos (cot) 


(with vo to be determined) the extra component of the acceleration is 


ev x B eB 

Vy 

evo B 

cos (cot) 

me me 

-V x 

me 

sin(wt) 


The magnetic force is therefore anti-parallel to the electric force. The equation of motion is 

e 


x = — [E - 
m 


x B/c 


or 


x 
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= UVo 


cos (wt) 
sin(wt) 


= — [Eq - v 0 B/c] 

TO 


cos (cot) 
sin(wt) 


Our guess is then indeed a solution, provided 


, eB\ e 

V 0 \co I -= — E 0 

me) to 


or equivalently 


*’o = 


eE 0 


m(co + cog) 


(24.23) 

(24.24) 

(24.25) 

(24.26) 

(24.27) 

(24.28) 


What is happening is that the magnetic force is opposing, and therefore hindering, the electric 
centripetal force, resulting in a velocity which is smaller by a factor 1/(1 + cog/co) (see figure 24.2). 
For a magnetic field which is anti-parallel to the wave-vector, or equivalently for a wave of the 
opposite helicity (with the field, velocity and displacement rotating clockwise rather than anti¬ 
clockwise), the magnetic force adds to the centripetal acceleration due to the electric field, and the 
velocity is then larger by a factor 1/(1 — coq/oj). 

The propagation of an electromagnetic wave of a certain frequency to through a magnetized 
plasma is, for the case of B parallel or anti-parallel to the wave-vector k, identical to that for a 
non-magnetizecl plasma consiting of particles with the same number density and charge but with a 
slightly different mass ratio m' = to(1 ± wg/w), or equivalently to a wave in a neutral plasma with 
a slightly different plasma frequency 


co„ = 


1 ± COG / w 


(24.29) 
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Figure 24.2: For a circularly polarized wave, the electrons confined to circular orbits by the cen¬ 
tripetal acceleration eE 0 /m and therefore move with speed Uo = eE 0 /mco (left panel). If there 
is a magnetic field parallel (anti-parallel) to the wave vector the centripetal acceleration is par¬ 
tially annulled (enhanced) by a magnetic force, and the velocity is reduced (increased) by a factor 
1/(1 ± Wg/w). 


The dispersion relation is therefore 


2 7 2 2 

C k + = U! — 




1 ± LOq/oJ 


(24.30) 


This is the dispersion relation for circularly polarized waves. These are the ‘propagation eigen¬ 
states’ of a magnetized plasma. Here we are more interested in the propagation of radiation which 
is initially linearly polarized. We can obtain evolution as follows: We can decompose an initially 
linearly polarized wave into the sum of two circular waves of opposite helicity. These then propagate 
independently through some path length of plasma and we can then recombine them to obtain the 
final field. 

For example, consider the wave 


E x (z,t) 

_ E 0 

cos (cot — k + z) 
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Ey 
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(24.31) 


i.e. the sum of two circularly polarized waves. At t = 0, z = 0 the field is (E x ,E y ) = (Eq,0). Now 
write k± = /;• t: A k, to obtain 


E x (z,t) 

Ey (X, t) 


Eq cos (uit — kz) 


cos zAk 
z sin A k 


E 0 cos (ojt — kz)R(zAk) 


1 

0 


(24.32) 


Where R(d) denotes the 2-D rotation matrix. For A k <C k this is a linearly polarized wave whose 
plane over polarization rotates progressively with distance 9(z) = zAk. More generally, if the plasma 
density and/or magnetic field (and therefore Afc = (k+ — fc_)/2) varies with position the rotation 
angle is 

0(z) = ^ J dz (k+ - k-) ~ J dz ufac (24.33) 

or equivalently 

2 ttp 3 r 

0 = 2 2 2 / dl nB ■ ( 24 - 34 ) 

The line integral appearing here is called the rotation measure. Now of course we usually do not 
know what the initial electric field orientation was, but the rotation angle depends on frequency, so 
by measuring the polarization angle as a function of frequency we can infer the rotation measure. 
Combining the rotation measure and the dispersion measure provides independent estimates of the 
large-scale field B and also the integral of the electron density. 
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24.3 Problems 


24.3.1 Dispersion Measure 

A distant pulsar emits short pulses which propagate to us through an intervening ionized medium. 

a) Sketch the ‘periodogram’ (power as a function of frequency and time) for the received signal. 

b) Show that the signal arrival time is related to frequency as 



(24.35) 


where iv p is the plasma frequency. 

c) Under what conditions is the approximation above valid? 
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Chapter 25 

The Laws of Gravity 


25.1 General Relativity 

The best theory for gravity is Einstein’s general theory of relativity. This is a classical theory 
of geometrodynamics, and has not yet been accommodated within a proper quantum mechanical 
framework. 

In general relativity matter causes curvature of space-time, which in turn influences the orbits of 
particles etc. As Wheeler has colorfully put it, “space tells matter how to move — matter tells space 
how to curve”. Central to general relativity is the metric tensor g which is the generalization 
to curved space-times of the Minkowski metric rj^ from special relativity. The basic equation of 
general relativity is an identity between the curvature tensor G M „, which is constructed from second 
derivatives of the metric, and the stress-energy tensor T„ v describing the matter. 

In the weak-fielcl limit of GR the metric is = rj + h^ where h is a small perturbation. 
If, in addition, the matter configuration is slowly varying (velocity of massive particles <C c) then 
GR becomes identical to Newtonian gravity. Newtonian gravity provides a good description of most 
weak field phenomena with the exception of gravitational waves (these are usually weak fields, but 
require rapidly moving sources to be efficiently generated). 

The bending of light by weak gravitational fields can also be ‘fudged’ simply by multiplying the 
Newtonian result for a test particle of velocity c by a factor two. 


25.2 Newtonian Gravity 


In Newton’s theory the acceleration of a particle is the sum over all other particles of G times the 
mass times the inverse square of the distance. 




Gmj(xj - Xj) 


... ,Xi-Xj 
*7=7 


■ 13 


where 


G ~ 6.67 x 10 -8 cm 3 g -1 s -2 
For a continuous density distribution this is 

J3„l ( X ' - X ) 


X = g(x) = G J d 3 x' p(x') ^ 


(25.1) 

(25.2) 

(25.3) 


The gravity g can be written as the gradient of the gravitational potential g = — where 

(25.4) 


4>(x) = -G / dV 


Taking the gradient of <f>(x) one recovers (25.3). 
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In addition to the explicit formula for the potential as an spatial integral (25.41 there is also an 
equivalent local relationship between the Laplacian of the potential V 2< f> and the density p 

V 2 4> = 4t rGp 


(25.5) 


which is Poisson’s equation. 


(25.31 


One way to establish the equivalence of (25.41 and (25.5) is to take the divergence of the gravity 
V x • g(x) = -V 2 4> = G [ d 3 x' p(x')V x • ( , X '~* ) (25.6) 


now the divergence V • (x/a; 3 ) appearing here is readily computed and is found to vanish for x ^ 0, so 
it must therefore be proportional to the Dirac (5-function. To obtain the constant of proportionality 
integrate V • (x/a; 3 ) over a small sphere and use the divergence theor em to obtain J d 3 x V • (x/a; 3 ) = 
/dS • x/ar 3 = 47t so we have V • (x/a: 3 ) = 47n5(x) and using this in (|25.6| yields Poisson’s equation 
( |25^| . 


Integrating (25.51 over some region gives 

AnGM = 47 tG J d 3 x p(x) = J d 3 


V“4> = / dS • V4> 


(25.7) 


so the integral of the normal component of the gravity over some closed surface is equal to 47 tG 
times the mass enclosed. This is Gauss’ law. 

For a system of point masses the gravitational binding energy is defined as 


1 v-y- GmilTlj 
' 2 2 —j | X j — Xj| 

the factor 1/2 arising because each pair is counted twice in this sum. 
For a continuous distribution 


(25.8) 


W =\J d3x p( x )$( x )- (25.9) 

Note that ‘continuous distribution’ results can be used for point masses with p( r) —* <5(r —r.;). 


25.3 Spherical Systems 

25.3.1 Newton’s Theorems 


Newton found that 


• g = 0 inside a spherical shell of mass. 


• The gravity outside such a shell is the same as for an equivalent mass at the origin. 


These can be proved geometrically (see Binney and Tremaine), and they also follow directly from 
Gauss’ law and spherical symmetry. 

These theorems imply that the gravity g(r) for an arbitrary spherical system with cumulative 
mass profile M(r) is 


g = - 


GM(r) 


(25.10) 


25.3.2 Circular and Escape Speed 


The speed of a particle on a circular orbit satisfies 

dv v 2 GM (r) 

7 . — — o ^ ^circ = 

at r r z 

The escape speed is 

v eS c = y/2d>(r) 

where 4) is measured relative to its value at spatial infinity. 


\J GM(r)/r 


(25.11) 

(25.12) 
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25.3.3 Useful Spherical Models 

Point Mass 

For a point mass M ( r ) = m 

• The potential is $ = — Gm/r. 

• The circular velocity is u c j rc = \jGm/r. 

• The escape velocity is v esc = V2v c i rc . 

• The i; c i rc oc r^ 1 / 2 circular speed profile is usually referred to as a Keplerian profile. 


Uniform Density Sphere 

For a static uniform sphere of density p 


• The gravity is 


GAI ( r ) 47t 

g = —2— = ~T G P r - 
H 3 


• The circular speed is 



• The orbital period is 


Ur bit. 


2irr 

t'circ 


which is independent of the radius of the orbit. 



(25.13) 


(25.14) 


(25.15) 


• The potential, measured with respect to the origin, is a parabola and the equation of motion 
for test particles within the sphere is 


47rGp 

r =-r. 

3 

The period of any orbit in this potential is the same as that for a circular orbit. 


(25.16) 


• The dynamical time corresponding to density p is variously defined as the orbital time, the 
collapse time etc, but is always on the order of ym = 1 j\[Gp. 


All of the above properties are independent of the radius a of the sphere, and the dynamical and 
other times-scales are well defined in the limit that a —» oo. The potential with respect to spatial 
infinity depends on the radius and is given by 


$(r) = 


—2nGp (a 2 — |r 2 ) 

_47r Gpa 3 

3 r 


for 


J r < a 
\ r > a 


(25.17) 


Power Law Density Profile 

A power law density profile p(r) = po(r/ro)~ a has 

• Mass M{r) oc r 3 ~ a . 

• We need a < 3 if the mass at the origin is to be finite. 

• The density cusp at the origin can be ‘softened’ as in the NFW models. 


• A flat rotation curve results for a = 2 and is referred to as a singular isothermal sphere profile. 
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Hernquist and NFW Models 


Mass condensations that grow in cosmological simulations have been found to be quite well described 
by double power-law models. 

The NFW model is 

P(r) oc , 2 \ 2 ' (25.18) 

r{r 2 + r 2 ) 


which has asymptotic forms 



for 


/ r « r c 

\ r»r c 


(25.19) 



Chapter 26 

Collisionless Systems 


26.1 Relaxation Time 


The statistical mechanics of gravitating systems is very different from collisional gases. In the latter, 
long range electrostatic forces are screened by the neutrality of matter, whereas for the former the 
acceleration of particles is usually dominated by long range forces. For example, for a random 
distribution of point masses of mass m and number density n, the typical gravity from the nearest 
particle is g ~ Gm/n 2 / 3 while the overall gravity of the system is g ~ GM to t/R 2 ~ GmnR which is 
larger than the short-range force by a factor n 1 / 3 R ~ N 1 / 3 where N is the number of particles in 
the system. 

In many systems, the short-range gravitational accelerations are almost completely negligible, 
and the state of the system depends entirely on the way it was initially assembled. 

To estimate the effect of graininess of the mass distribution on particle motions consider a collision 
with impact parameter b. This will give a transverse impulse on the order of the acceleration times 
the duration of the impact or 




Gm b 
b 2 v 


Gm 

bv 


(26.1) 


In crossing the system once the mean number of collisions with impact parameter in the interval b 
to b + db is dn ~ (N/ R 2 )bdb. Now the mean vector sum of the impulses vanishes, but the mean 
square impulse accumulates and integrating over impact parameters gives the mean square impulse 
for one crossing of the system 


(Afj 


N 

R? 




2 

N In A 


(26.2) 


with A = R/b m \ n . The minimum impact parameter is 6 m j n = Gm/v 2 , and is the impact parameter 
that results in a large deflection <5v_l ~ v. 

Now v 2 ~ GmN/R so R/b mln ~ Rv 2 /Gm = N and therefore in 1 crossing we have 

(26.3) 

which is typically much less than unity and the number of crossings required for the effect of short- 
range collisions to become important is ~ N/ ln N and the relaxation time is 


trelax 


N 

h VN toibit - 


(26.4) 


• In galaxies with N stars ~ 10 11 and dynamical, or orbital, time tdyn ~ 10 8 yr the relaxation 
time is on the order of < re iax ~ 10 19 yr which greatly exceeds the age of the Universe which is 
t ~ H -1 ~ 10 10 yr. This allows that the shapes of elliptical galaxies may be supported by 
anisotropic pressure, rather than by rotation, as was thought in olden times. 
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• In globular clusters with N ~ 10 5 and f<jyn ~ 10 5 yr and the relaxation time is on the order 
of the Hubble time, so relaxations effects are important for such systems. 

• In galaxy clusters with N ~ 10 2 and t<jyn ~ 10 9 yr and the relaxation time is somewhat 
larger than the Hubble time, so relaxations effects are marginally effective for such systems. 


26.2 Jeans Equations 


On time-scales less than the relaxation time, a gravitating system of point masses is described by 
the collisionless Boltzmann equation 


|j-+v- V r /- V4>- V v / = 0 

Taking the zeroth and first moments of this equation yields the equation of continuity 

l + v M = ° 


with u = (v), and the Euler equation 

dm 

ar + (u - 


= - 


9$ 

dx. 


1 d (P<rli 

P 


8Xn 


= 0 


(26.5) 


(26.6) 


(26.7) 


where of- = ((u, — Ui)(vj — Uj )) is the velocity dispersion tensor. In the steady state, the left hand 
side is (u • V)u and is the centrifugal acceleration. Note that we do not split the pressure tensor 
per 2 - into isotropic and anisotropic parts as we did for a collisional gas. 


Together with Poisson’s equation (25.51 to relate the potential d> to the mass density, (26.61 and 


(26.71 provide 5 equations, known as Jeans’ equations. However, one has in general 10 unknowns 


(the density p, the three components of the streaming velocity u and the six components of the 
symmetric 3x3 tensor o\ 2 . In order to make progress it is necessary to make some assumption 
about the velocity dispersion tensor; common choices are to model of,- as isotropic or, for a spherical 
system, to introduce an anisotropy parameter specifying the ratio of radial to tangential components 
of cr 2 -. 


26.3 The Virial Theorem 


Consider the moment of inertia of a system of point masses / = mr 2 . The time derivative is 

I = 2^2 mr • r and taking a further time derivative gives 

-I = mr 2 + mr • r. (26.8) 

Requiring that I vanish for a stable system and expressing the acceleration r as a sum of the gravity 
from all the other particles gives 

2T + ^ mr • ^ GW J, _ * = 0 (26.9) 

r r'p 1 1 


with T the kinetic energy of the particles. 

Now switching r <-> r' in the last term simply changes the sign, so we can write this as 
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(26.10) 
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and we therefore have the virial theorem 


2T+W = 0. 


(26.11) 


The virial theorem provides a useful way to estimate the mass of bound systems. For example, 
if we assume equal mass particles, then the particle mass m is given by 


in = 


2 E^ 2 

G £ l /o2~ 

pairs 


(26.12) 


For a roughly spherical system it is reasonable to assume that the 3-dimensional velocity dispersion 
in the numerator is 3 times the observed line of sight velocity dispersion. Similarly, the mean 
harmonic radius which appears in the denominator can be estimated from the observed distribution 
of projected separations, and this provides a useful way to determine the mass of a gravitating 
system. 

The virial theorem gives the correct answer if the luminous particles trace the mass, but will fail 
if, for example, the dark matter has a different profile from the luminous particles. 


26.4 Applications of the Virial Theorem 

26.4.1 Spherical Collapse Model 

Consider a uniform static sphere of dust of mass M and radius i?,;. A perfectly symmetrical sphere 
will collapse to form a black hole, but this requires an enormous collapse factor, and any sensible 
amount of asphericity or initial angular momentum will cause the system to instead oscillate and 
eventually settle into some virialized final state. The initial energy is all potential E t = W, ~ 
—GM 2 /R and energy is conserved so we have Tf + Wf = Ef = E j. But the virial theorem tells us 
that 2 Tf + Wf = 0 — y Tf = —Wf/2 and therefore Tf + Wf = Wf/2 = Wi which means that the 
sphere must collapse by about a factor 2 in radius. 


26.4.2 Galaxy Cluster Mass to Light Ratios 

Photometric observations provide the surface brightness E i of a cluster. On the other hand, obser¬ 
vations of the velocity dispersion a 2 together with the virial theorem give a 2 ~ ~ GLL ~ GY, m R 

where S m is the projected mass density. If the distance D to the cluster is known from its redshift 
then the mass to light ratio can be estimated as 


M _ E to _ a 2 
L E z GDOYii 


(26.13) 


where 9 is the angular size of the cluster. 

Applying this technique, Zwicky found that clusters have M/L ~ 300 Mq/Lq where subscript © 
indicates solar values. 


26.4.3 Flat Rotation Curve Halos 


More accurate masses are obtained for disk galaxies if the rotation curve can be measured (say from 
HI radio measurements). Once the measured velocity width has been corrected for inclination, the 
mass is given by 


GM(r) 


»cW' 


(26.14) 


r 


Spiral galaxies are found to have rather flat rotation curves extending to at least a few tens of 
kpc (well beyond the radius where most of the visible stars reside) and this indicates that these 
galaxies have dark halos with M oc r. 
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26.5 Masses from Kinematic Tracers 


The virial theorem is exact, but requires that the light traces the mass. This is not a very good 
assumption. The Euler equation (26.7) can be used to constrain the gravitational potential of a 
system using kinematic data for particles which need not necessarily trace the mass. 

For a static non-rotating system and assuming isotropic velocity dispersion temperature the Euler 
equation is 

S7{na 2 1D ) = -nV$ (26.15) 


with n the number density of particles. This is just like the equation of hydrostatic equilibrium for 
a collisional system. 

Now let’s assume that the observed velocity dispersion happens to be isothermal, ie independent 
of radius da 2 /dr = 0, and so 


2 nd$> dr 

din = ———— = constant. 
dn dr 


(26.16) 


If we further assume that the kinematic tracer density is a power law nocr 7 then this is consistent 
with a mass profile 

M(r) = M 0 (r/r 0 ) (26.17) 


since we then have Vd* = GM/r 2 = GM 0 /(r^r) oc 1 /r. 

With these assumptions, the observed velocity dispersion is related to the circular velocity v 2 = 
GM 0 /r 0 by 


Cl d =v„\- 


d Inn 
dlnr 


(26.18) 


• If the tracers have the same profile as the mass (n oc 1/r 2 ) then v 2 = 2af D . 

• If the tracers have a steeper (shallower) profile then the observed velocity dispersion will be 
lower (higher). This is not surprising since for shallower profiles the test particle orbits take 
them further up the potential. 

• For gas and galaxies with similar profiles residing in the same potential well we expect the 
specific energy to be the same, or 


1 2 _ (3/2 )kT 

2 3D nm p 


(26.19) 


where /r is the mean molecular weight (fi = 1/2 for fully ionized hydrogen for instance). This 
is a testable prediction which seems to be quite well obeyed. This result places constraints 
on possible long range interactions between dark matter particles since these would affect the 
galaxies but not the gas. 


This was for a power-law tracer density profile. Another interesting model is a population of 
tracers with finite extent residing in a flat rotation curve extended halo. The velocity dispersion can 
then be found much as in the derivation of the virial theorem by considering the second derivative 
of the moment of inertia: 

0 =*/ = 5> 2 + 5>r (26.20) 

but now with r = — GM 0 /(rr 0 ) this gives 7 ’ 2 = (GM 0 /ro) X/ 1 or equivalently 

<4d = <r 2 > = v 2 c (26.21) 

as compared to cr 2 D = (3/2)v/ for particles with n oc 1/r 2 profile. 
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26.6 The Oort Limit 

The Jeans equations can also be applied to a flattened disk geometry, in which case the vertical 
mass profile can be determined from measurements of densities and velocities of kinematic tracers 
oscillating up and down through the plane. 

We start with the collisionless Boltzmann equation. For a planar geometry we have V r / = 
z df/dz and V r <I> = zd^/dz. Multiplying by v z and integrating (with df/dt = 0 as appropriate for 
a steady state solution) gives 


d 3 v vl 


dj_ 

dz 


< 9 $ 

dz 


d 3 i 




(26.22) 


Integrating the second term by parts and dividing through by the number density n gives 


1 dn(v 2 ) <9$ 

n dz dz 


but V 2 d> = d 2 $/dz 2 = 47 xGp and so 


P = 


1 


q 1 dn{ vl) 
n dz 


47 tG dz 


(26.23) 


(26.24) 


where the quantities on the right hand side, which is, in essence, the second derivative of the kinetic 
pressure of the tracers, is observable given a collection of stars with well determined distances. 
Applying this to stars in the solar neighborhood, Oort found 

p ~ 0.15M e pc -3 (26.25) 

which is known as the Oort limit. 

Integrating the density gives the surface mass density 

s <-’> = / d -’'’< z > = &k rf t a < 26 - 26 > 


or, for 2 = 700pc, 

E(700pc) ~ 90Mq P c -2 . (26.27) 

Comparing this to the surface luminosity density E; is of some interest, since it can tell us if 
there is dark matter in the disk. The value of E; has been hotly debated, but the estimates of the 
fraction of missing matter in the disk range from zero (Gilmore) to about 50% (Bahcall). 


26.7 Problems 

26.7.1 Two-body Relaxation. 

Consider a virialised self-gravitating system of size R consisting of N identical particles of mass m 
(picture). 

1. Give order of magnitude estimates of 

(a) The long-range gravitational acceleration due to the system as a whole. 

(b) The typical distance b from a particle to its nearest neighbor. 

(c) The short-range gravitational acceleration due to the nearest neighbor. 

(d) The ratio of long- to short-range force (in terms of N ). 

(e) The typical velocity v of a particle. 
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2. What is the typical transverse impulse Av suffered by a particle moving at velocity v as it 
passes a neighboring particle at impact parameter bl 

3. How many such encounters does a particle suffer as it traverses the whole system once? 

4. The mean impulse from many encounters averages to zero, but the mean square impulse 
accululates. 

(a) What is the accululated mean square impulse in one crossing? 

(b) How large is this compared to the mean square orbital velocity? 

(c) At this rate, how many orbits or crossing times will it take for the particle’s velocity to 
be significantly affected by nearest neighbor collisions. 

5. How does this ‘relaxation time’ compare with the age of the Universe for 

(a) A galaxy consisting of ~ 10 1 0 stars with a dynamical time of ~ 10 8 years? 

(b) A globular cluster with ~ 10 5 stars and a dynamical time of ~ 10 5 years? 



Chapter 27 


Evolution of Gravitating Systems 


27.1 Negative Specific Heats 

A curious property of bound stable gravitating systems is that they have negative specific heats. 
The total energy is E = K + W (with K here the kinetic energy) and the virial theorem is W = —2AT 
so the total energy is 

E = -AT. (27.1) 

The kinetic energy scales with the kinetic temperature T defined such that K = rnv 2 = | NkT 
and so the specific heat is 

C= d ^ = ~ 3 Nk (27.2) 

dT 2 v ’ 

which is negative. 

This leads to instability if gravitating systems are allowed to interact with other systems. As 
an example, consider two virialized clusters, both consisting of N equal mass particles, and the 
first with radius R \, kinetic energy K\, potential energy W\ and similarly for cluster 2. The virial 
theorem tells us that each cluster satisfies W = —2 T. 

The total entropy for one of these clusters is S = Nk (In T 3 / 2 — In n) where n is the density. Now 
n oc 1/.R 3 oc (—IV) 3 and T oc K, so the entropy for a cloud is 

S = Nk ^ In K — 3 ln(—W)^ + constant. (27.3) 

Now the virial theorem tells us that if the energy of the cluster changes, the changes in logarithms 
of the energies here are related by A In K = Aln(— W) and therefore the change in the entropy if 
the kinetic energy changes by AA" is 

A S = ^-Nk^. (27.4) 

This says that if a system gets hotter (AI\ > 0) it loses entropy. 

Now if some energy passes between the two systems such that AI\i = AK and AI\ 2 = —AA", 
then the change in the total entropy is 

AStotai = -\NkAK (-L--LY (27.5) 

3 \ A -| l\ 2 ) 

Thus if ATi > AT 2 — so system 1 has a higher kinetic temperature than system 2 — then for the 
change in the total entropy to be positive requires AK > 0, which, since the total energy E = —AT 
is negative, corresponds, as usual, to a transfer of energy from the hotter system to the cooler, and, 

due to the negative specific heat, this results in the hotter system becoming still hotter and vice 

verse. 

We have been rather vague about how the energy exchange actually takes place. 
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XXX 


Figure 27.1: Illustration of phase-mixing for an initially cold gravitating system (such as cold-dark 
matter). The three panels illustrate the folding of the phase sheet as the system evolves. 


• One manifestation of this instability is the possibility of runaway interactions between binary 
star systems and single stars within globular clusters. Here, if the orbital motion of the 
binaries exceeds the typical velocity of stars in the cluster then energy will be transferred from 
the binaries to the general star population, causing the binaries to become still hotter etc. 

• A similar interaction is between stars of different mass within a cluster. It is not unreasonable 
to assume that initially the stars all have similar velocities. However, that means that the 
energy per particle, and therefore also the kinetic temperature, scales with the mass, and so 
collisions will tend to transfer energy from the more massive to the lighter stars. This leads to 
concentration of massive stars in the core and the lighter stars to be expelled, or scattered to 
large radius orbits. 

• Another example is a hierarchical system with bound gravitating sub-units orbiting within 
a large system (much like globular clusters in a galaxy say, or galaxies within a cluster of 
galaxies), where the energy transfer is mediated by tidal forces and inelasticity of collisions 
between the clusters. Entropic considerations here say that interactions which increase entropy 
are those where energy is transferred from hotter to colder bodies, where temperature is defined 
so that kT is the kinetic energy of a ‘meta-particle’. This usually means transfer of energy from 
large-scales to smaller scales (for example, a globular cluster orbiting in a galaxy has much 
more energy than a single star in a globular cluster). This causes the ‘hot’ system (the cluster 
orbits within the halo) to give energy to the sub-units (the stars in the clusters themselves). 
This causes the hot system to heat up and the cold system to cool further until eventually the 
clusters will become unbound. 

Negative specific heats lead to a gravo-thermal catastrophe. There is no hope of finding a true 
equilibrium solution like the Maxwellian for a collisional gas. 


27.2 Phase Mixing 

Liouville’s theorem tells us that the phase-space density is constant along particle orbits. If the 
dark matter is initially ‘cold’ (ie it has negligible thermal or random velocities) then it occupies only 
a 3-dimensional subspace v x = v y = v z = 0 of phase-space, and it will remain that way forever. 
Initially, this phase sheet is flat, but as structures start to form the particles will accelerate and 
the velocity at a given position will deviate from zero. Eventually, particles will fall into potential 
wells and oscillate, with the result that the phase sheet gets wrapped up, as illustrated in 1-spatial 
dimension is figure [27d~| 

Once the phase-sheets has folded over on itself, there will no longer be a single velocity at each 
point in space, rather there will be a set of discrete velocities which are populated. Our galaxy is 
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about 50 dynamical times old, so one might expect there to be about 50 distinct ‘streams’ of dark 
matter passing through our neighborhood. 

While the true phase space density does not evolve, this wrapping of phase sheets will cause the 
coarse-grained phase-space density to decrease. 

27.3 Violent Relaxation 

In a stable potential particles orbit at fixed energy and dE/dt = 0. When a gravitating system is 
initially forming (perhaps from the merger of sub-units) the potential will be rapidly fluctuating, and 
the particle energy will change as dE/dt = mdQ/dt. This process will tend to generate a distribution 
in which the velocities of particles are independent of the particle mass, as was originally pointed 
out by Lynden-Bell. 


27.4 Dynamical Friction 


A heavy object orbiting in a system consisting of predominantly lighter particles will tend to sink 
towards the center because of dynamical friction. The calculation of the dynamical friction force is 
similar to the calculation of the relaxation time and is also very similar indeed to the calculation of 
bremsstrahlung radiation. 

There are two ways of looking at this. One picture is that the massive particle will induce a wake 
in the lighter particles, and the excess density in the wake exerts a gravitational pull on the heavy 
article. Another view is that a massive particle passing by causes an impulse in the ‘background’ 
particles, and transfers to them kinetic energy which must come eventually from the kinetic energy 
of the massive particle. We will follow the second route. 

Let a heavy particle of mass M move at velocity V through a background sea of light particles 
of mass m. As it passes by, a background particle at impact parameter b is given a velocity impulse 
5v = ( GM/b 2 ) x (b/V) and a corresponding kinetic energy A E = m(Sv) 2 /2 ~ m(GM/bV) 2 . Note 
the rate of such collisions is dN/dt = nVbdb with n the density of background particles. The 
deceleration of the massive particle is then 


dV 1 dE to 
~dt = MV~dt = MV 


db bnV 



(27.6) 


As before, this gives a logarithmic integral and we have 


dV 

dt 


G 2 Mmn 
— V* 


In A 


(27.7) 


where as before the factor In A is the ratio of the size of the system to the minimum impact parameter 
&min) and is rather insensitive to the precise value of b rn - ]n . 


• The acceleration is proportional to the mass of the heavy particle, so the force is proportional 
to M 2 . This can also be obtained from considering the wake — the amplitude of the density 
fluctuation in the wake is proportional to M. 

• The acceleration is inversely proportional to the inverse square of the massive particle velocity. 

• The analysis here is oversimplified in that the velocity of the background particles was not taken 
into account. A more careful analysis shows, not surprisingly, that the energy transferred to 
rapidly moving background particles is less than if they are stationary. 

• Applications include decay of globular cluster orbits in a massive galactic halo. For M ~ 
10 6 Mq and V ~ 250km/s the dynamical friction time becomes comparable to the age of the 
universe at about lkpc. The Magellanic clouds are more distant (~ 50kpc) but much more 
massive ( M ~ 10 1o Mq) and again the friction time-scale is on the order of the Hubble time. 

• Dynamical friction is may play a role in building giant ‘cD’ galaxies in the centers of clusters. 
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27.5 Collisions Between Galaxies 

Galaxies will occasionally collide in clusters. However, since the orbital velocity in the cluster is 
typically on the order of 1000 km/s which is considerably greater than the internal rotation velocity 
(typically 200 km/s) these collisions have little effect on the galaxies since the galaxies just feel a 
short impulsive force and have no time to respond. 

In poor clusters and groups the relative collision velocity is close to the galaxy internal velocities 
and such collisions would be inelastic and would lead to merging of galaxies. 

The question whether elliptical galaxies form from merging of spirals has been hotly debated. 
On one side, there are certainly examples of tidal tails indicating ongoing merging systems. On the 
other, there is evidence that ellipticals are very old and can be found in a mature state at redshifts 
z ~ 1, so if they did form by merging they did so a long time ago. 

27.6 Tidal Stripping 

Another evolutionary effect in gravitating systems is tidal stripping. The outer parts of galaxies in 
clusters may be stripped in this way. The tidal radius is the radius at which the density is equal to 
the mean density of the parent body. 



Part VI 
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Chapter 28 

Friedmann-Robertson-Walker 
Models 


28.1 Newtonian Cosmology 


Consider a small uniform density uniformly expanding sphere of pressure-free dust of radius a and 
density p. Gauss’ law tells us that the acceleration of particles at the edge of the sphere is a = 
— GM/a 2 , or, with M = |7rpa 3 , 


a = — T nGpa. 
o 

This acceleration equation can be integrated by multiplying by 2a, which gives 

da 2 2 GMa 


2 aa = 


dt 


so 


2 f , 2 GMa f , 2 GM 2 GM 

a = — / dt --— = — / da —=— =-1- constant. 


With M = 17rpa 3 again, this gives the energy equation 


a 2 = -nGpa 2 

O 


2E n 


(28.1) 

(28.2) 

(28.3) 

(28.4) 


where Eq is constant. This equation expresses conservation of energy, the term on the left hand 
side being proportional to the kinetic energy and the first term on the right hand side being the 
proportional to potential energy. More precisely, Eq is the total energy per unit mass — the specific 
energy, that is — of test particles on the boundary of the sphere. 

The equivalence of (28.1) and (28.41 could have been established by differentiating the latter. 
This gives 

s 

(28.5) 


2 aa = -TrG(pa 2 + 2 pda) 
o 


but with the continuity equation, which in this context is pa 3 = constant, or 


P = —3 -p 
a 


(28.6) 


we recover (28.11. 


Note that both sides of (28.1) are linear in the dust-ball radius a, so if a(t) is a solution then so 
is a' = aa for an arbitrary constant a. This means that, if the density is initially uniform, all of the 
shells of the dust cloud evolve in the same way, and the density will remain uniform. 

It is interesting to contrast the properties of a uniform density gravitating cloud to a sphere of 
uniform electrical charge density. In the latter case, the electric field at the surface of the sphere 
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grows linearly with the size of the sphere. Since the electric field is an observable quantity, this 
means that there is no sensible solution as a —> oo, it also means that an observer within a finite 
sphere can always determine the direction to the center of the sphere from local measurements of 
E. The analogous quantity in the gravitating sphere is the gravity vector g, which also diverges 
linearly with the sphere radius etc. However, the gravity itself is not directly observable — this is 
fundamentally because of the equivalence principle; all particles accelerate alike in a gravitational 
field — and the only directly observable property of the gravity is its spatial gradient, which is the 
tidal Geld tensor d 2 <h / dxidxj. Since the gravity is proportional to radius this means that the tide is 
spatially uniform and perfectly regular in the limit a —► oo. Also it means that an observer cannot 
determine the direction to the center from local measurements; each observer simply sees locally 
isotropic expansion. Newtonian gravity can therefore model an expanding pressure-free cosmology 
of arbitrary size. 

Newton, Laplace and contemporaries were of course unaware that we live in a galaxy surrounded 
by a seemingly infinite sea of other galaxies which, on large scales, are apparently uniformly dis¬ 
tributed. Nor were they aware that these galaxies are receding from us according to Hubble’s law. 
Had they been privy to this information, they would have had no difficulty concocting the physical 
model above which is used by all practicing cosmologists today in interpreting their observations. 

The system of equations above are also valid in general relativity, as originally shown by Fried¬ 
mann and by Robertson and Walker. This is because Newtonian gravity provides the correct descrip¬ 
tion of a finite sphere in the limit a —► 0, and because of Birchoff’s theorem , which is the relativistic 
equivalent of Gauss’ law, and which says that for a spherically symmetric mass distribution, the 
gravity within some shell is independent of the matter distribution outside. We will refer to these 
models as the FRW models. 


28.2 Solution of the Energy Equation 

There is an analytic parametric solution of the energy equation: 

a{rf) = A{ 1 — cos 77 ) 
t(rf) = B[r] — sin 77 ) 

from which it follows that the expansion velocity is 

da da/dr] A sin 77 




dt dt/dr) HI—cos 77 


(28.7) 


(28.8) 


Substituting these in (28.41 one finds that this equation is satisfied if we choose the constants A, B 
to be 

A = GM/2\E 0 \ 

B = GM/{2\E 0 \f/ 2 


(28.9) 


where M = | npa 3 . 

These are the solutions for energy constant E 0 < 0, corresponding to a gravitationally bound 
system which first expands but then turns around and collapses back to zero radius. The total time 
from big-bang to big-crunch in these models is 2ttB. A family of such solutions are shown as the 
lower solid curves in figures [2871 and 28.2 


There are analogous solutions for the case of Ho > 0 where A and B are still given by (28.9) but 
where the trigonometric functions are replaced by their hyperbolic equivalents: 


0 ( 77 ) = A(cosh 77 — 1 ) 
t(rf) = B( sinh ?7 — 77 ) 


(28.10) 


Such solutions are unbound and expand forever. The constant B in this case is the time at which 
the kinetic and potential energies become comparable to the total energy. For the kinetic and 

potential energies are much larger, in modulus, than the total energy, while for t^>B the potential 
energy becomes negligible and the solutions become freely expanding with a oc t. 
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Figure 28.1: Scale factor a vs time for a family of FRW models. The quantities plotted in the lower 
set of solid curves are t = B(rj — sin rj) and a = (I?) 2 / 3 ( 1 — cost?) for B = 0.5,1.0,2.0,4.0,8.0. The 
quantities plotted in the upper set of solid curves are t = B(sinhr) — rj) and a = (S) 2 / 3 (cosh 7? — 1) 
for the same set of B values. These curves are the time evolution of the radii of spheres of the 
same mass for various energies. The lower/upper curves have negative/positive total energy. All of 
these solutions become identical at early times (this is because the kinetic and potential energies are 
nearly equal but opposite and much larger in modulus than the total energy). The dashed curve is 
the marginally bound case with zero total energy. This can be considered to be the limiting case of 
either open or closed models as B —> oo. 


A family of such solutions are shown as the upper set of solid curves in figures [28. l| ancl [28.2| The 
negative/positive total energy solutions are referred to as ‘open’ and ‘closed’ respectively. Here we 
see that the open models are open-ended in time. The real reason for this choice of terminology will 
become apparent later where we will see that the spatial geometries corresponding to these models 
are open and closed respectively. The marginally bound solution is commonly referred to as the 
Einstein - de Sitter model. 

Some useful cosmological parameters are 

• The Hubble parameter or expansion rate is defined to be 


H = d/a (28.11) 

and has units of inverse time. Its current value is measurable locally from the ratio of recession 
velocity to distance and is Hq ~ 75km/s/Mpc. It is generally very similar to the inverse age 
of the Universe. 




_ 3 H 2 
Pcrit ~87rG' 


The critical density is defined as 


(28.12) 
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t 


Figure 28.2: As figure |28.1 but plotted on logarithmic scales. This makes clearer that all models 
have the same power-law form, a oc t 2 / 3 , for early times t <C B. The open model solutions become 
linear, a oc t for t B. 


For a given expansion rate H , (28.121 give s the density required so that the potential energy 
just balances the kinetic energy. Dividing (28.41 by 87rGa 2 /3 gives 

Pcrit = p + SEo/AnGa 2 (28.13) 

so if p > p cr it we must have E 0 < 0 and the universe must be bound and vice versa. 

• The cosmological density parameter is 


n = 


p 

Pcrit 


(28.14) 


The density parameter is close to unity at early times in all models (and also in the big crunch 
for closed models) but tends to diverge strongly from the critical density solution if S! is not 
exactly unity (see figure 28.31. 


28.3 Asymptotic Behavior 

In the energy equation ( |28.4| ) the potential energy term scales as 1/a which means that the kinetic 
energy must also scale as 1/a. Regardless of the current value of the energy constant Eo, if we 
go back to sufficiently early times the two time varying terms will completely dominate and the 
constant of energy will be negligible. Taking Eq —■► 0 the energy equation becomes 

a 2 = 2 GM/a. 


(28.15) 
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Figure 28.3: Evolution of the cosmological density parameter D vs time in matter dominated FRW 
models. As before, t = £?(?? — sin rj) for the upper curves (closed models) for B = 0.5,1.0,2.0,4.0, 8.0, 
and t = B {sinhp — rj) for the lower curves (open models). The density parameter is = 2(1 — 
cos??)/sin 2 ?; for the closed models and = 2(cosh?y — l)/sinh 2 ?? for the open models. 


If we postulate a power-law solution a = ao(t/to) a then a = aa/t, so the energy equation says 
a 2 a 2 /t 2 = 2 GM/a oc 1/a, or 

a ext 2 / 3 . (28.16) 

This behavior can also be inferred from the parametric solutions in the limit ?? <C 1. At early times 
p —> p cr it and therefore f2 —> 1. 

The future asymptotic behavior depends critically on the energy constant. For Eq < 0 the 
Universe will reach a maximum size and will then re-collapse and the late time behavior will be 
the reverse of the initial expansion. If E 0 > 0 then at some point the kinetic and potential terms 
will become comparable to Eq, at which point U will start to deviate appreciably from unity. For 
late times a 2 —» 2Eq, corresponding to undecelerated expansion. The critical density then scales as 
Pcrit oc H 2 = (a/a) 2 oc 1/a 2 while the actual density scales as p oc 1/a 3 and sofi—>0asflocl/tin 
the limit t —■> oo. 


28.4 The Density Parameter 

Estimates of the local expansion rate H 0 tell us the current critical density p cr it- Redshift surveys 
tell us the local luminosity density of the Universe C. If we multiply this by the mass-to-light ratio 
M/L determined from virial analysis of clusters of galaxies we find that the density of matter is 
around 0.2 — 0.3 times critical, and therefore that the density parameter U cr ;t — 0.2 — 0.3. 

There is of course considerable slop in the dynamical mass measurements, and also in the implicit 
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assumption that the AI/L of clusters is representative, but there is a growing consensus that this 
is about the correct value, and almost unanimous agreement that fi < 1. Also, estimates of the 
acceleration of the universe from supernovae studies also indicate a low matter density. This result 
is remarkable for two reasons: O is very close to unity and yet it is apparently not exactly equal to 
unity. If this estimate is correct it says that we live at a rather special time when matter has just 
stopped dominating the expansion of the Universe. Why this should be remains a mystery. 


28.5 The Cosmological Redshift 


Consider a photon which is emitted by the observer at the center of an expanding sphere of mass 
M and radius a and received by an observer on the surface. Let us assume that the radius of the 
sphere is small enough that a <C c. The received wavelength will then be greater than that emitted 
because of the Doppler effect, and we therefore have a redshift 

1 + 2 = ^ = 1 + d/c. (28.17) 

/ 'em 


The fractional change in wavelength is AA/A = a/c. Any gravitational redshift effect is AA/A ~ 
GM/ac 2 and is negligible compared to the Doppler effect since GAI/ac 2 ~ (a/c) 2 <C a/c. 

Now consider the change in the size of the sphere in the time At = a/c that it takes for the 
photon to make this trip. The ratio of sphere sizes is 


a 0 bs a + a At a At , . , 

- — - — 1 H - - — 1 H - CL C. 

a em a a 


(28.18) 


Thus, for a small sphere, the fractional change in the wavelength of light is equal to the fractional 
change in the sphere radius. If we have a photon which travels a large distance through a FRW 
model then we can compute the net change in the wavelength by multiplying the effect due to a 
series of infinitesimal steps. The result is that 


1 + 2 = 


Aobs 


^obs 


i.e. the wavelength grows in proportional to the scale factor. 


(28.19) 


28.6 The Horizon Problem 

The FRW equations provide a self-consistent mathematical model for a large, perhaps infinite, 
uniform density expanding Universe. However, it has a rather peculiar feature; at early times the 
distance over which light, and therefore any other causal influence, can propagate shrinks to zero, 
and moreover it shrinks faster than the Universe itself. 

To analyze this it is useful to introduce the concept of comoving coordinates r which are related 
to physical coordinates by 

x = a(t) r (28.20) 

Thus the surface of our dust sphere has comoving radius r = 1, and the dust particles do not move 
in comoving coordinates. Now the physical distance that a photon can travel in time interval dt 
is just dx = cdt , corresponding to a change in comoving coordinate dr = cdt/a(t ). The integrated 
comoving distance that a photon can travel since t = 0 is called the horizon rh and is 

r ‘ (,) = c / Jr “ / ZZ “ * ,/3 < 2821 > 

0 0 


which tends to zero for t —> 0. 
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Figure 28.4: The causal structure of FRW models is most clearly shown if we plot particle world-lines 
in 77 — r coordinates (i.e. conformal time vs comoving spatial coordinate). The vertical lines represent 
comoving observers, while the diagonal lines are light rays. Causal influences are constrained to lie 
within the light-cones. Since conformal time r] —> 0 at the big-bang, this means that the the initial 
singularity (the line 77 = 0 ) is acausal, in the sense that comoving observers are initially out of causal 
contact with each other. If, however, the Universe accelerates at early times, the initial singularity 
is pushed back to 77 = —00 in this plot, and the big-bang is then causal. 


A useful way to visualize the causal structure of the FRW model is to work in conformal time 77 
which satisfies dr] = cdt/a{t). Since photons have dr = cdt/a(t ) this means that if we plot world-lines 
in 77 — r space then comoving observers are lines of constant r while photons are lines at ±45 degrees 
(hence the terminology ‘conformal time’). This is shown in figure 28.4 This shows that comoving 


observers which can now exchange information and physically influence each other were, at early 
times, causally disconnected. 

The problematic nature of the horizon is most clearly appreciated if we consider the photons of 
the microwave background radiation. These photons have been propagating freely since they were 
last scattered at a redshift of Zd ec — 10 3 . At that time, the horizon was smaller than the present 
value by a factor (to/idec ) 1,/3 = (ao/adec ) 1 ^ 2 = ~dec — 30. Consider photons arriving from opposite 
directions (see figure 28.5). The regions which last scattered the radiation now arriving from these 
directions were not able to causally influence each other at the time the radiation was scattered 
indeed they are not causally connected even today — yet the radiation we see is, aside from the 
dipole anisotropy due to our motion, isotropic to a few parts in 10 5 . The standard FRW model 
provides no explanation for how this degree of isotropy was established; it must be put in as initial 
conditions. 

The horizon problem is fundamentally due to the fact that the universal expansion is decelerating: 
d < 0. If we consider a pair of neighboring observers who are currently receding from each other with 
some velocity 1 ; « c, then at earlier times this recession velocity was larger and, at some sufficiently 
early time, exceeded the speed of light. Before that time these observers were causally disconnected. 
If one had instead a power-law expansion a oc t a with a > 1 then a > 0 and the comoving horizon 
scale diverges as t —> 00 and there is no horizon problem. 


28.7 Cosmology with Pressure 

So far we have considered pressure-free dust. What happens if this dust is actually a dust of bombs 
all primed to explode at a certain time? At that instant, the equation of state changes from P = 0 
to P 7 ^ 0 as the shrapnel, radiation etc from the explosions will have positive pressure. 

Naively, one might imagine that pressure would tend to counteract gravity and would cause the 
universal expansion to decelerate less rapidly, but this is incorrect. It is true that for a finite sphere, 
such an explosion would accelerate the outer layers, and this acceleration would work its way in, but 
this acceleration is caused by pressure gradients which only arise by virtue of there being an edge to 
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Figure 28.5: The wiggly diagonal lines in this conformal space-time plot show the world lines of 
two photons which we now see arriving from opposite directions, and which were last scattered at 
points a, b. The horizontal dashed line indicates the moment of recombination of the plasma. Also 
shown are the past light cones of the events a, b. The regions which can have causally influenced the 
photons prior to their departure are very small compared to the separation of the scattering events. 


the sphere. Deep within a very large sphere, there is no pressure gradient (at least for short times 
after the explosions), so there are no VP forces acting. 

Recall that for P = 0 the equations governing the scale factor of the universe and the density 
parameter are the acceleration equation 


a = —-nGpa (28.22) 

O 

the continuity equation 

p = -3% (28.23) 

and the energy equation 

d 2 = ^TrGpa 2 + 2 E 0 (28.24) 

o 

with E 0 constant. These equations are not independent; any one can be derived from the other two. 
They provide two independent equations which determine the two functions a(t) and p(t), the initial 
conditions being set by the energy constant Eq and also the time of the big-bang. 

One consequence of non-zero pressure is to modify the continuity equation. For P = 0 this is the 
statement that the mass within a comoving sphere is constant: pa 3 = constant. If P ^ 0 this is not 
correct. With P > 0, any volume of the Universe does work on its surroundings as it expands, and 
since mass and energy are equivalent according to Einstein the mass is not conserved. Equating c 2 
times the rate of change of the mass to the PdV work for a sphere of radius a gives 

d(4npa 3 c 2 /3) = -PdV = -4n Pa 2 adt (28.25) 


and hence 



(28.26) 


which must be used in place of (28.231. Note that this equation is a simple consequence of conser¬ 
vation of energy — the first law of thermodynamics — and the identification of energy and mass 
E = Me 2 . 


A positive pressure therefore causes a reduction in the mass enclosed within a comoving sphere. 
Now Newtonian gravity theory can readily describe the motion of particles on radial orbits in the 
potential generated by a time varying mass. The result is that the acceleration is still given by 
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(28.22), but that the term E 0 in the energy equation is no longer constant in time. Differentiating 
(28.24) with respect to time gives 


or, using (28.22) and (28.231, 


Eq = aa— -nGpa — -TrGpaa 
o o 


E 0 = InCpPaa/c 2 


(28.27) 


(28.28) 


In this respect, however, Newtonian theory is wrong. General relativity tells us that the term Eq 
in the energy equation actually remains constant even with P / 0. This means that the Newtonian 


acceleration equation (28.221 cannot be correct. Differentiating the energy equation (with Eq = 


constant) and using the equation of continuity gives the modified acceleration equation 


“=-3* C 


3 P 


(28.29) 


Thus we see that a positive pressure actually increases the deceleration of the Universal expansion. 


Equation (28.291 is an important and surprising result; it says that pressure gravitates in general 
relativity. We should be clear what is meant by the density p here, and, in particular, draw a 
distinction between total mass-energy density and the proper mass density. Naively, one might 


imagine the pressure term in (28.29) as being some 0(v 2 /c 2 ) correction of the proper mass density 


for the kinetic energy of random motions. This term has the right magnitude at least. This is not 
correct; the density p here is the total mass-energy density including kinetic energy. To reinforce 
this, note that the proper mass density is not conserved in the transition from P = 0 to P > 0, since 
some of the rest-mass of the bombs is consumed in creating the kinetic energy of shrapnel and any 
energy in radiation. Since a and E 0 should not change discontinuously in this transition it is the 
total mass energy density — which is conserved through this transition — which must appear here. 

At first sight it may appear unphysical to identify the source of the gravitational field with 
p+ 3P/c 2 . This is because this quantity does not obey any fundamental conservation law; we can 
change the pressure, within limits, at will by exploding bombs etc. The bombs example gives an 
irreversible change in the pressure, but this is not an essential requirement. Consider instead a 
universe consisting of a dust of baseball players who, at some predetermined instant after the big 
bang, start practicing their pitching and lob balls to their neighbors. Since the players are receding 
from each other they receive balls with less energy than which they are thrown, and they must 
therefore be steadily consuming rest-mass in the process. If the players subsequently stop practicing, 
the pressure returns to zero. While contrived, this example shows that we can in principle switch 
the pressure on and off as we like. 

To see why this is unsettling, consider what happens if we have a large region containing a 
dust of bombs primed to explode — would this then suddenly change the gravitational acceleration, 
and therefore the orbits, of satellites outside? It is pretty obvious that they cannot change, at 
least instantaneously, since this would violate causality. To extend the example, let us enclose the 
bombs in an initially slack balloon. The steady state after the bombs explode is to have the internal 
pressure balanced by tension in the enclosing membrane. Again, it does not seem reasonable that the 
gravitating mass as seen by an external observer (measuring the Keplerian orbits of satellites say) 
would change. How can the internal micro-physics in the balloon affect the total mass and energy 
seen by an external observer? The resolution, in this case, is that the extra gravitational attraction 
due to the positive pressure in the interior is canceled by a repulsion arising from the tension (a 
kind of negative pressure) in the balloon skin. Inside the balloon, however, the acceleration of test 
particles is correctly given by (28.29) rather than the Newtonian law (28.221. 


Note that energy is not conserved locally in an expanding Universe with non-zero pressure, since 
each region does work on its surroundings. Recall that, in mechanics, energy is conserved only for 
a Lagrangian which has no explicit time dependence. The Lagrangian density for fields or particles 
in an expanding Universe, when written in comoving coordinates, contains the time-dependent scale 
factor a(t) as a parameter, so energy is not conserved. For a finite sphere, bounded by a tense 
membrane to provide appropriate boundary conditions the total energy (including the potential 
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energy stored in the membrane) is conserved. Thus, if you want the energy budget to balance with 
P^O, just imagine the Universe to be contained within some large balloon. 

An important feature that emerges from the fact that (28.41 applies equally with or without 
pressure is that changing the equation of state of the universe can have no influence on the sign of 
the energy term; if the universe is initially unbound then it stays that way forever and vice versa. A 
related consequence is that, for open models, the late-time asymptotic velocity of a sphere a{t —» oo) 
is also independent of the thermal history of the Universe. 


28.8 Radiation Dominated Universe 


Another important constituent of the Universe is the thermal microwave background radiation 
(MBR), which is a relic of the big bang. Currently, the density of the MBR is around 10 -4 times 
the critical density, and therefore also much less than that of the matter, but photons redshift and 
lose energy as the universe expands (a consequence of the work done by their pressure) and the 
radiation density scales as p ra( j oc a -4 while p m atter oc a 3 so when the Universe was about 10 -4 
times smaller than today (ie at a redshift of equality 1 + z eq = 10 4 ) the matter and radiation density 
would have been equal. Since the expansion law in the matter dominated era is a oc f 2 / 3 , and the 
present age of the Universe is to ~ 10 lo yr, this transition from radiation to matter domination at 


t. 


eq — (1 + Z, 


eq 


)- 3 / 2 t 0 ^ io 4 yr. At earlier times, the Universe was radiation dominated. 


In the radiation dominated era it is a very good approximation to neglect the constant 2E 0 in 
the energy equation (28.4), and with p oc a~ 4 the solution of this equation is again a power law 
a oc t a but now with a = 1/2. 


• In both radiation and matter dominated eras, the age of the Universe is roughly the inverse 
of the expansion rate, though the constant of proportionality depends on the details of the 
equation of state. 

• The Universe decelerates more with non-zero positive pressure. 

• The horizon problem persists in the radiation dominated era since Th — cf dt/a(t) oc t 1 / 2 . As 
discussed, there is a ‘horizon problem’ in any universe which decelerates. 


28.9 Number of Quanta per Horizon Volume 


Assuming that the universe is radiation dominated at early times it is interesting to ask: what is 
the number of quanta per horizon volume (i.e. physical volume V = c 3 t 3 ). 

For thermal radiation most of the energy is in photons with E ~ kT ; if we imagine the Universe 
to be a periodic box of side L, the spacing of wave-numbers is 6k = 2i t/L so, with hk ~ E/c the 
number of photons in volume L 3 is N ~ ( k/Sk ) 3 ~ ( E/hc) 3 L 3 and therefore the number density 
n ~ E 3 /(h 3 c 3 ). The density is p ~ nE/c 2 ~ E 4 /(h 3 c 5 ) — the Stefan-Boltzmann law — and this is 
related to the age of the universe by t ~ 1/y/Gp and therefore the relation between the age of the 
universe and the characteristic energy of the quanta is 


h 3/2 c 5/2 1 

G 1 / 2 E 2' 


(28.30) 


We are ignoring here the fact that at very high energies there are more particle species that are in 
equilibrium via number changing reactions. This is because number changing reactions for particles 
of mass m are ‘frozen out’ when the temperature falls below T ~ me?/k. More rigorously we should 
include a factor for the number of degrees of freedom, but this only introduces a correction factor 
of order unity. 

The physical horizon size is V/ = (cf) 3 , so with n ~ E 3 / ( h 3 c 3 ) the number of quanta per horizon 
volume is = nVh ~ E 3 t 3 /h 3 or, with (28.301 


(E 3 \f h 9 ?c 15 ? \ f h 1 ?^ 2 \ 3 _/S\ 

VU ) ^ G 3 / 2 E e ) ~ ^ G 1 /‘ 2 E ) ~ 


(28.31) 
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where we have defined the Planck energy 


E p i = 



~ 10 19 GeV. 


It is also useful to define the Planck mass 


Epi 

m pi= ^ 


One can also define the Planck time 



tpi = h/Epi ~ 10 43 s. 


(28.32) 


(28.33) 


(28.34) 


Thus as we go back in time, the number of quanta per horizon volume decreases. Only for 
Universes much older than the Planck-time is it reasonable to assume a nearly uniform energy 
density. At Planck-scale energies, the fluctuations in the fields are expected to produce strong 
gravitational effects. 


28.10 Curvature of Space-Time 

So far we have concentrated on local properties of the universe — expansion rate, density, pressure 
etc — and have thus avoided any discussion of the global curvature of space-time , which, according 
to general relativity is caused by matter. 

Space-time curvature is described by the metric tensor g^ v . This is the curved-space general¬ 
ization of the Minkowski metric rj and is defined such that the proper separation of two events 
with coordinate separation dx is ds 2 = g ^dxT dx v . The metric, or line element for a spatially 
homogeneous and isotropic world model can be written as 

ds 2 = — dr 2 + a 2 (r)(duj 2 + Sl(u)do 2 ) (28.35) 

where da 2 = dd 2 + sin 2 0d(j) 2 is the usual solid angle. 

• The radial coordinate w is a comoving coordinate — comoving observers have constant u>, 9 , 

<t>- 


• The time coordinate time r is the proper time since the big-bang for a comoving observer. 


• The line element (28.351 follows from geometrical considerations alone (see Gunn’s Saas Fee 
lectures), and is independent of any specific theory of gravity. 

• The function Sk(co) takes one of three forms, corresponding to the value of the curvature 
eigenvalue k = 0, ±1: 


SfcM = 



for k 


: 

o 

-l 


(28.36) 


— For A; = 0, the constant r surfaces are spatially flat. 

— For k = +1, the constant r surfaces are unbounded but spatially finite (or closed) 3- 
dimensional analogs of the 2-sphere. The scale factor a is then the curvature radius. 

— For k = —1, the constant r surfaces are unbounded and spatially infinite. They are 
analogous to a saddle in two dimensions. 

• A common alternative representation of the line element is to use radial coordinate r = Sk(ui), 
in terms of which the line element is 


ds 2 = - 



dr 2 + a 2 (r) 


1 — kr 2 


(28.37) 
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• It is common to define the conformal time 77 such that drj = dr/a(r) and the line element is 
then 

„2 _ 2 / \/ >2 , J , .2 , o 2 ( 


ds 2 = a 2 (T)(—dr/ 2 + doo 2 + Sl(co)do 2 ). 
Radial photons move along 45 degree diagonals in 77 — co space. 


(28.38) 


So far this is pure geometry plus the assumptions of homogeneity and isotropy. The field equations 
of general relativity provide a relation between the curvature of space-time and the matter content. 
Here, this boils down to the equation 


da\ 87 r 2 2 

d?) = Y Gpa - ck 


(28.39) 


with k = ±1 the curvature eigenvalue. This is equivalent to the energy equation (28.41, with the 
energy constant 2E$ —> —c 2 fc, and with an appropriate scaling of a. In the Newtonian analysis, the 
scale factor is arbitrary. In general relativity, we take the scale factor to be the curvature scale. 

This tells us that spatially closed and open models correspond to bound and unbound cosmologies 
respectively. The borderline case is spatially flat, and can be obtained as the limiting case of either 
closed or open models. 

Note that the curvature scale is a comoving scale, and is therefore fixed. Changing the equation 
of state can have no influence on the geometry of the Universe. However, in inflation, the curvature 
scale is stretched and becomes exponentially large, and in this way the Universe can be prepared in 
a state which is effectively spatially flat. 

Finally, one should not confuse space-time curvature and the curvature of the r = constant 
surfaces. In the k = 0 solution, the constant r surfaces are spatially flat, but the mass density does 
not vanish so the space-time is still curved. In open models, the density, and therefore space-time 
curvature tend to zero at late times, but the constant r surfaces are still hyperbolic, saddle-like 
surfaces; the metric in that limit (with a(r) oc r) is simply a re-parameterization of the flat space- 
time Minkowski metric. 

The spatial metric of the unit 2-sphere is dl 2 = dO 2 + sin 2 9dip 2 . The length of a line element 
perpendicular to the radial direction (i.e. dO = 0) is dl = sin Qdcp and the circumference of a circle 
is l = 27rsin6 l . This increases linearly with 9 for 9 -C 1, reaches a maximum of 27 t at 9 = n/2, and 
then shrinks to zero at the antipodal point 9 = ir. Similarly, the spatial metric of the closed model 
is a 3-sphere, with dl 2 = dco 2 + sin 2 coder 2 . The area of a spherical surface which is perpendicular to 
the radius vector (i.e. dco = 0) is dA = a 2 sin 2 coder 2 with do 2 = dff 2 + sin 2 9d<p 2 . The total area of a 
sphere of radius co is A = Airapt) 2 sin 2 co. Just as in the 2-dimensional case, this peaks at co = 7r/2 
and shrinks back to zero at the antipodal point co = n. 

The closed model is finite, yet has no boundary. However, at least if we restrict attention to 
zero-pressure equation of state, we are free to take only a finite part of the total solution co < w max . 
This is a spherically symmetric mass configuration, and so should match onto the Schwarzschilcl 
solution for a point mass m, for which the space-time metric is (in units such that c = G = 1) 


-(drj 2 = -(l- 2 ^j (dt) 2 +( l- 2 ^ 


(dr ) 2 + r 2 ((dd ) 2 + sin 2 9{dep) 2 ). 


(28.40) 


Comparing the angular part of the metric it is apparent that the Schwarzschild radial coordinate r 
and the FRW ‘development angle’ co are related by r = asinw. Now a particle at the edge of the 
FRW model can equally be considered to be a radially moving test particle in the Schwarzschild 
geometry. We found in problem (???) that such particles orbits satisfy 


dr \ 2Gm 

— = - 1 - constant. 

dr J r 


(28.41) 


Compare this with the energy equation 


da\ _ 2GM 
dr J a 


constant. 


(28.42) 
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where we have defined the mass parameter M = 47 rpa 3 / 3 . With r = asinw this implies that the 
Schwarzschild mass parameter is 

m = M sin 3 uj (28.43) 

This is interesting. For w < 1, the mass increases as w 3 as expected. However, the mass is 
maximized for a model with a development angle of 7 t/2 , or half of the complete closed model. If we 
take a larger development angle, and therefore include more proper-mass, the Schwarzschild mass 
parameter decreases. To the outside world, this positive addition of proper mass has negative total 
energy. This means that the negative gravitational potential energy outweighs the rest-mass energy. 
The gravitating mass shrinks to zero as uj —> n. Evidently a nearly complete closed model with 
uj = tv — e looks, to the outside world, like a very low mass, that of a much smaller closed model 
section with ui = e. 

The total energy of a complete closed universe is therefore zero. Zel’dovich, and many others 
subsequently, have argued that this is therefore a natural choice of world model if, for instance, one 
imagines that the Universe is created by some kind of quantum mechanical tunneling event. To 
be consistent with the apparent flatness of the Universe today one would need to assume that the 
curvature scale has been inflated to be much larger than the present apparent horizon size. 

It is interesting to compare the external gravitational mass with the total proper mass. The 
volume element of the parallelepiped with legs dw, d0 : d<f> is 

d 3 x = ( adu>) x (a sin weft?) x (a sin uj sin 6d(j>) , (28.44) 


so the total mass interior to to is 


-^proper = pa 3 / du> sin 2 uj dO sin 9 


d<t> = 


sin 2 u> 


(28.45) 


The gravitational mass (28.431 and proper mass (|28.45 ) are shown in figure 28.6 


These partial closed FRW models start from a singularity of infinite density and then expand, 
passing through the Schwarzschild radius r = 2 Gm/c 2 . With r = asinw, m = Msin 3 w, and 
a = M (1 — cos ?y), particles on the exterior cross the Schwarzschild radius at conformal time 77 when 
1 — cos 77 = 2 sin 2 uj. For uj <C 1, this occurs when 77 = 2 uj. Such solutions spend the great majority 
of their time outside the Schwarzschild radius. For the case uj = 7 r /2 — i.e. half of the complete 
solution — the exterior particles just reach the Schwarzschild radius. It may seem strange that the 
matter in these models can expand from within the Schwarzschild radius, but this is indeed the 
case. If one considers only the collapsing phase of these models then one has the classic model for 
black-hole formation as developed by Oppenheimer and Snyder. The spherical mass collapses to a 
point, and photons leaving the surface can only escape to infinity if they embark on their journey 
while the radius exceeds the Schwarzschild radius. The expanding phase of these models is just the 
time reverse of such models; what we have is a ‘white-hole’ solution The initial singularity is visible 
to the outside world (eventually) just as photons from the outside can fall in to the final singularity. 

One can visualize the geometry solution in a 2-dimensional analog (figure 28.7). The interior is 
part of a 2 -sphere, which matches smoothly onto a exterior solution much like a depressed rubber 
sheet. The closed solutions with ui > 7 t /2 are slightly peculiar in the sense that, at the edge, the 
radius r is decreasing with increasing uj. This means that the FRW geometry matches onto a ‘throat’ 
like exterior geometry. 

In the limit of small development angle w max <C 1 both the proper mass (28.45) and the gravita¬ 
tional mass (28.431 are given, to lowest order in w max , by 


dIp r oper — d7g Tav — il7w max — 


47T 


P a3 U max 


(28.46) 


by the gravitational mass. However, keeping terms of next higher non vanishing order one obtains 
for the ratio of proper to gravitational mass 


M, 


Mr, 


proper 


— 1 + 


+ 0(4ax)' 


(28.47) 


This result will prove useful below. 
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Figure 28.6: The proper-mass and gravitational mass for a partial closed FRW cosmology are plotted 
against the development angle u>. 

28.11 Problems 

28.11.1 Energy of a Uniform Expanding Sphere 

Compute the total kinetic energy and the total gravitational binding energy for a uniformly expand¬ 
ing constant density sphere of radius a, density p and surface velocity a. 

28.11.2 Solution of the FRW Energy Equation 

Verify that a parametric solution of the energy equation 

a 2 = ^7rGpa 2 + 2 E 0 

O 

is 

a(rj) = A{ 1 — cos rj) 
t(rf) = B(rj — sinry) 


(28.48) 

(28.49) 


and find the relation between constants A, B and the energy constant E 0 . 
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Figure 28.7: A partial closed FRW model consists of part of a 3-sphere which matches smoothly 
onto the horn-like Schwarzschild geometry. The 2-dimensional analog of this composite geometry 
can be visualized as part of a parabolic sphere connecting smoothly onto a 2-dimensional horn. 


318 


CHAPTER 28. FRIEDMANN-ROBERTSON-WALKER MODELS 



Chapter 29 


Inflation 


29.1 Problems with the FRW Models 

We have already discussed two problematic features of the matter or radiation dominated FRW 
models. One of these is the horizon problem , which says that the remarkable uniformity we see 
on large scales today must be imposed a-causally. The other is the flatness problem , which is 
that the observational fact that S! is currently not very different from unity requires it to have 
been astonishingly close to unity in the distant past. Both of these problems render the models 
unattractive since the basic properties of flatness and homogeneity are not really explained by the 
theory, rather they must be imposed as finely tuned initial conditions. 

To these problems we can add the monopole problem. In grand unified theories (GUTs) massive 
magnetic monopoles are predicted to exist. In the hot big bang model, at the time of GUT symmetry 
breaking (as the Universe cools through the GUT temperature of around 10 16 GeV) these monopoles 
appear as topological defects, with a number density on the order of one per horizon size. These 
objects are a definite prediction of GUTs, yet their existence in anything like this abundance would 
be a disaster for cosmology, as they would have a density today hugely in excess of that observed. 

There are also a number of additional unsettling features of the hot big bang model. One might 
ask what happened before the initial singularity? What are the seeds of the structure that we see in 
the Universe? What explains the baryon asymmetry of the Universe? There are now ~ 10 8 photons 
per baryon, which seems to imply that there was initially a slight asymmetry between baryons and 
anti-baryons at the one part in 10 8 level. 


29.2 The Inflationary Scenario 


In the inflationary scenario many of these problems appear to be solved or at least ameliorated. 
The essence of inflation is to assume that at early times the Universe passed through a phase with 
a strongly negative pressure (i.e. positive tension). 

Let us start with the horizon problem. As we have already discussed, this can be traced directly 
to the deceleration of the expansion Universe; if d < 0 then the velocity difference between any 
two observers, which is proportional to a, decreases with time. Therefore, going back in time, the 
relative velocity inexorably increases and at some finite time in the past reaches the speed of light 
c, and before that the two observers cannot exchange information or causal influences. 

The only way to avoid this is for the Universe to have undergone an accelerating phase with d > 0 


in its early history. From the acceleration equation (28.291 this requires p+ 3P/c 2 < 0, or a strong 
negative pressure P < —pc 2 /3. This is the strange, and somewhat counterintuitive, feature of the 
general relativistic expansion law; just as a positive pressure augments the gravitational deceleration, 
a sufficiently strong negative pressure can cause the expansion to accelerate. 

At first sight it is hard to see how a negative pressure can arise. Certainly, for a gas of particles 
interacting through localized collisions, the pressure cannot be negative. As shown in Weinberg’s 
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book, for instance, a relativistic ideal gas must have pressure in the range 0 < P < pc 2 /3. Also, we 
can consider pressure to be the flux of momentum. A particle moving in the positive x direction 
carries a positive ^-component of momentum, and therefore the flux of x-momentum passing in the 
positive x-direction through a surface must be positive. 

However, if we consider fields, rather than particles, then the possibility of negative pressure is 
not at all unreasonable. After all, the most commonplace field that we can feel macroscopically is 
the magnetic field. Anyone who has played with a pair of bar-magnets or pulled magnets off a fridge 
knows that such fields have strong tension. However, such fields do not have isotropic tension; there 
is tension along the field lines — you have to do work to stretch the field out and create more of it 

but in the transverse directions the opposite is true; as we see from images of the field produced 
with iron filings the field between a pair of magnets clearly wants to burst out sideways. This follows 
directly from energetic considerations, along with flux conservation. Imagine you try to confine the 
field to pass through a smaller area. Flux conservation means that the field strength must increase 
inversely with the area, but the energy density scales as the square of the field strength, so the 
total energy is larger the smaller the cross-sectional area. A static magnetic field then has negative 
pressure along the field lines but positive pressure in the transverse directions. 

This anisotropy of the pressure for a macroscopic quasi-static magnetic field is associated with 
the fact that electromagnetism is a vector field. Now in grand unified theories, there are also scalar 
fields such as the Higgs field. The leap of inspiration which led to the concept of inflation was the 
realization that a field like this can have a pressure which is isotropic — this is natural enough since 
a scalar field cannot point in any particular direction — and may have the negative pressure required 
to make the universal expansion accelerate. As discussed in §18. 4| a possible solution of the field 
equations for a massive scalar field at early times is a spatially and temporally constant field value. 
The energy density and pressure (i.e. diagonal components of the stress-energy tensor) for a scalar 
field are 


pc 2 = + T (V ^ )2 + v ^ 

(29.1) 


(29.2) 


We are following the usual convention in cosmology that any mass term to 2 c 4 ^ 2 /Ti 2 is considered 
part of the potential function V(cf>). A static and homogeneous field configuration — one with <f> = 
and Vcj) = 0 — therefore has density p = V{(j))/c 2 and pressure P = = —pc 2 . Such a 

potential dominated field configuration therefore amply satisfies the inequality P < —pc?1 3. 

With P = —pc 2 , the continuity equation tells us that p = 0. The cosmological expansion does 
work against the tension of the field at just the rate required to keep the density constant; for this 
reason the inflationary universe has been dubbed the ultimate free lunch. The acceleration equation 
is then 

a = — ^-ttG(p + 3P/c 2 ) = ^-nGp, (29.3) 

o o 

with p = constant. The general solution of this equation is 

a(t) = a+e +Ht + a-e~ Ht (29.4) 

with 

H = \J8nGp/3 = constant. (29.5) 

For generic initial conditions, a potential dominated universe, will tend towards an exponentially 
expanding solution a oc e Ht . 

During inflation, the comoving horizon size - defined here as the comoving distance that a photon 
can travel per expansion time — is ~ cH~ 1 /a. Since H is constant this decreases exponentially 
as rh oc e~ Ht . At early times during inflation photons can travel great comoving distances but this 
decreases as time goes on. In a viable inflationary model, inflation cannot continue forever, but 
must end, with conversion of the energy density — all stored in the scalar field — into ‘ordinary’ 
matter with P = pc 2 /3, i.e. we must make a transition from a scalar field dominated universe to 
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a radiation dominated hot-big bang model. Side-stepping, for the moment, the issue of exactly 
how this so-called ‘re-heating’ occurs, the overall behavior of the comoving horizon scale (as we 
have defined it above) is shown as the solid line in figure 29.1 This allows the possibility that the 


entire Universe was initially in causal contact. Let’s look at this from the point of view of a pair 
of comoving observers. These have a constant comoving separation, as indicated by the horizontal 
dashed line say. During inflation, the velocity difference between these observers increases as they 
accelerate apart, and a pair of observers with initial relative velocity v < c will at some time lose 
causal contact with each other once their relative velocity reaches the speed of light. If the universe 
later becomes radiation dominated, the relative velocity will subsequently fall and these observers 
can regain causal contact. For those who feel uneasy with the somewhat hand-waving definition of 
the horizon size as the distance light can travel in an expansion time, consider instead the rigorous 
definition of the comoving distance to a distant source as a function of the ‘look-back time ’t = to —t 


r(r) 



dr 

a(t 0 - t) ' 


(29.6) 


In the matter dominated era this increases with decreasing r at first, but tends towards a limiting 
asymptote. Back in the inflationary era, however, this integral grows exponentially and becomes 
arbitrarily large. 

What about the flatness problem? Recall that departure from flatness is an indication of an 
imbalance between the kinetic and potential energy terms in the energy equation 


a 2 = ^7r Gpa 2 — kc 2 . (29.7) 

For an exactly flat universe k = 0 and these two terms are exactly equal. Now consider a universe 
which is initially open say, with k = —1. During an inflationary phase, the expansion accelerates, 
a increases, as must the potential energy term on the right hand side. Inflation acts to increase, 
exponentially, the kinetic and potential terms in the energy equation. Thus even if there is a non-zero 
initial energy constant, it will tend to become exponentially small at the end of inflation. Inflation 
therefore drives the universe towards flatness; the fl = 1 state becomes an attractor rather than an 
unstable state. Another way to look on this is to realize that in FRW models — and the inflationary 
universe is an FRW model, just one with a weird equation of state — the curvature scale is a 
comoving scale. Relative to the horizon scale, the curvature scale is stretched exponentially. Thus 
it might be that our universe is open or closed, but that the curvature scale has been stretched to 
be enormously larger than the currently observable region of the universe. 

What about the monopole problem? These are topological defects of a field. Inflation allows 
this field to be coherent over very large scales; up to the initial comoving horizon scale. Provided 
the universe re-heats to a temperature less than the GUT scale, monopoles — which have a mass 
around the GUT energy scale — will not be effectively created. 

It is interesting to ask, how many e-foldings of inflationary expansion are required in order to 
establish causality over the region of the universe (size l ~ c/Hq) that we can currently observe? 
The answer depends on the temperature at which reheating occurs. If this reheating temperature is 
around the energy scale of grand unification, or T ~ Tqut ~ 10 16 GeV, then the temperature falls by 
about a factor 10 25 before the Universe becomes matter dominated at a temperature of about lOeV. 
During that period the comoving horizon grows as oc t 1 / 2 oc a oc 1/T, or by about 15 orders of 
magnitude. Once the universe becomes matter dominated the horizon grows as oc t 1 / 3 oc a 1 / 2 or 
by about another factor of 100. The current horizon is therefore about 10 2 ' ~ e 62 times larger now 
than at the reheating time, so we need about 60 e-foldings of inflation. 


29.3 Chaotic Inflation 

Originally, it was imagined that the field driving inflation, the infiaton field, had a w-shaped potential 
of the kind involved in spontaneous symmetry breaking with the Higgs field. For reasons we shall 
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Figure 29.1: The evolution of the comoving horizon scale (heavy solid line) in a Universe which 
passes through three phases; an inflationary stage followed by a radiation dominated and then a 
matter dominated era. The ordinate is logarithmic and the abscissa is linear in the inflationary era 
and logarithmic thereafter. The diagonal line labeled r p i indicates the Planck length. The horizontal 
dashed line indicates the comoving separation of a pair of observers. This length scale appears at 
the Planck scale at some time during the inflationary era. We have no adequate description of 
physics below and to the left of the Planck scale line. The observers then accelerate away from one 
another. If they were to exchange signals they would perceive an increasing redshift. The separation 
between the observers reaches the horizon scale at the point labeled ‘a’. At that time their relative 
velocity reaches the speed of light and their relative redshift becomes infinite. Subsequently the 
observers are unable to exchange signals. At the reheating epoch the Universe starts to decelerate, 
and at point ‘b’ the recession velocity falls below the speed of light. The observers then re-appear 
on each other’s horizon; they can exchange signals which are received with steadily decreasing 
redshift. The separation chosen here is such that it re-enters the horizon during the radiation 
dominated era. Larger separations enter the horizon at later times. The current horizon scale is 
cto ~ c/Hq ~ 3000Mpc. Since the comoving horizon scale is proportional to t 1 / 3 in the matter era, 
the horizon scale at t eq is smaller than the current horizon by a factor ~ 100, or about ~ 30Mpc, the 
scale of super-clusters. The horizontal dashed line might represent the size of a region encompassing 
the matter now comprising a galaxy say. 


not go into here, such models have fallen out of favor. Instead, most attention is currently focused 
on so-called chaotic inflation models in which the field has a potential function as sketched in figure 
|29.2| It is assumed that the field starts out at some point far from the origin, and then evolves to 
smaller values much as a ball rolling down a hill. In this section we shall explore what is required in 
order to obtain a viable inflationary scenario, i.e. one in which there are sufficiently many e-foldings. 

For concreteness, consider a field with Lagrangian density 

£ = h? - (29 - 8) 

This is a massless field with a self-interaction term parameterized by the dimensionless constant 
A. Note that the Lagrangian density has units of energy density [£] = ML -1 T -2 so the field has 
dimensions [<f>] = M 1 / 2 L 1//2 T _1 (in natural units L = 1/M, T = 1/M the field has units of mass). 
Assuming the field to be spatially uniform, the equation of motion is 

4Ac 

ij> + 3Hcf> -|——(/ 3 = 0, (29.9) 

Tl 

where the last term is the potential gradient and the second term is the damping due to the cosmo- 
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Figure 29.2: In the chaotic inflation scenario, the potential function V(<j>) is assumed to be a mono- 
tonically increasing function with E(0) = 0. The potential could simply be a mass term V oc (f) 2 or 
perhaps V oc </> 4 . 


logical expansion. The energy density and pressure are 


pc 2 = 


1 12 , A ,4 


2c 2 

1 


Tic 


p = - x- 


2c 2 


he 


and the expansion rate is given by 


H2 =r G ^h G ( ^ 2+ ^ 4 


(29.10) 

(29.11) 

(29.12) 


Equation (29.91 is like that of a ball rolling down a hill with a frictional force, the coefficient of 


friction H being dependent on the field and the field velocity through (29.121. For such a system 


there are two limiting types of behavior, depending on the value of the field. In one, the friction is 
negligible and the field is in free-fall with (f> equal to the potential gradient. In the other, the friction 


is important, the first term in (29.91 is negligible compared to the other terms and the field moves 


at a ‘terminal velocity’ such that the friction force just balances the potential gradient. 

Let’s assume, for the moment, that the former is the case. The effective equation of motion is 
then 

" 4AC 13 " (29.13) 


= 0. 


The time-scale for changes in the field velocity is 

faccel 

After one acceleration time-scale the field will acquire a velocity 

/Ac 



(29.14) 


accel ^ \ , 


(29.15) 
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The kinetic energy term in the density is 



(29.16) 


so the potential and kinetic energy terms in the density are c ompara ble. Now the condition that the 
friction should still be negligible is 3H<p -C {\c/K)(j) 3 . Using (29.121, this inequality becomes 


3Hcj) ~ 3 


8ttGA<(> 4 /Ac 
3 he 3 




(29.17) 


The dimensionless interaction strength factors out of this inequality, so the condition that the friction 
be negligible is simply that the field be sufficiently weak (</> <C sJc^jG). 

Conversely, the condition that the friction should dominate — the slow-roll condition — is 


<p > 



(29.18) 


We can write this inequality in a rather more revealing manner if we note that the quantity 
has dimensions of mass, so we need 



(29.19) 


where m p \ is the Planck-mass. In natural units (Ti = c = 1) this says that if the field is much greater 
than the Planck-mass then the friction will dominate and the field will be unable to roll freely down 
the potential, rather it will roll slowly down the hill at the terminal velocity 


4Ac 
3 UH*' 


(29.20) 


Assuming that the inequality (29.181 holds, what is the equation of state, or equivalently how 
large is the positive kinetic energy term ~ 4> 2 /c 2 in the pressure as compared to the potential term 


V = X^/Tic! Squaring the terminal velocity (29.201 and using the inequality H 2 > (8/3)7 tGV/c 
yields 
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The first factor here is just the potential and, under the slow-roll condition (29.181, the second factor 
is much less than unity, so this says that 
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(29.22) 


The kinetic energy terms in the pressure and density are therefore much less than the potential 
terms and we therefore have P — —pc 2 as required for inflation to proceed. 

As already mentioned, in a viable model, inflation must be sustained for many e-foldings in order 
to solve the flatness, horizon problems. In one e-folding, the field will move a distance A (f> ~ <f>/H. 
For GUT scale inflation, where we need ~ 60 e-foldings, we need 


A</> _ </> 1 _ 

(f> Hcj) ^ 60 “ £ ‘ 


Using (29.201 and H 2 ~ GV/c 2 ~ G\<\r jh(? this becomes 


> 


eG‘ 


Thus, the field needs to exceed the Planck mass by at least a factor e x ! 2 
sufficiently many e-foldings of inflation. 


(29.23) 


(29.24) 
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In this model, the field rolls slowly — very slowly at first — down the potential and the universe 
inflates. The expansion rate H does not remain precisely constant, but decreases slowly with time. 
Eventually the field reaches the value (ft/c 3 ) 1 / 2 ^ ~ m p i, at which point the friction term Hep in the 
equation of motion is no longer effective and the field starts to oscillate freely about the potential 
minimum. As discussed previously, the field amplitude will damp adiabatically, and the density will 
fall as p oc 1/a 3 ; just as in a matter dominated universe. In the particle description, a macroscopic 
coherent field with V0 = 0 is a huge number of zero momentum particles — a ‘Bose condensate’ if 
you will — and therefore behaves just like non-relativistic particles. In more detail, one finds that 
the pressure for such a system oscillates about zero, being positive when <f> = 0 and 0 is maximized at 
the bottom of the potential and being negative at the limits of the field excursion when 0 = 0. This 
is not what we want, which is a transition to a radiation dominated cosmology. If, however, there are 
other fields x i n the Lagrangian, and if these are coupled to the inflaton field 0, then once the 0 field 
starts oscillating, it can decay into %-ons, provided these have a mass which is lower that that of the 
0-field. The details of this re-heating process depends on the details of the interactions between the 
field. In figure (29.1) we have assumed that re-heating happens promptly once the inflaton starts to 
oscillate. It is presumably possible, if the interactions are sufficiently weak, the universe might pass 
through a period of matter domination after the end of inflation before re-heating occurs. 

We have considered a rather specific model above, with a A 0 4 potential. The main results are 
not specific to this choice. Had we instead assumed V = (m 2 c 2 /Ti 2 )(p 2 — i.e. a non-interacting, but 
massive, field, then we again find that the ‘slow-roll’ condition is simply that 0 tJc4/G. 


29.4 Discussion 

The ‘inflationary scenario’ described above provides a plausible mechanism for preparing the universe 
in a flat and spatially homogeneous state on all scales that we can observe, starting from fairly generic 
initial conditions (rather than absurdly finely tuned ones). All we require is that the field start off at 
a sufficiently high value; the initial value of the field velocity 0 is largely irrelevant, since the cosmic 
drag term rapidly reduces 0 to the terminal velocity. The empirically evidence strongly encourages 
one to suspect that our universe has passed through such a phase. The situation facing cosmologists 
is rather like that facing a policeman who, walking down the high street, finds a jewelry shop with 
a broken window. Now it could be that this is a random accident, but the circumstantial evidence 
that it is a jewelry shop rather than a grocery or shoe shop, strongly encourages one to believe that 
this not an accident but a crime. 

As we shall discuss later, the inflationary scenario also creates density fluctuations which can seed 
the structures we see in the distribution of galaxies and in the cosmic microwave background. The 
amplitude of these fluctuations is strongly model dependent, but the prediction is for fluctuations 
with dependence on wavelength very much like that which seem to be required. 

These results make the inflationary model highly attractive. On the down side, one has to invoke 
a new field, the inflaton, precisely to obtain these desirable results. Initially, the development of this 
field of research was strongly linked to developments in fundamental particle physics — spontaneous 
symmetry breaking etc. - but the subject has now taken on a life of its own. While we have used 
GUT-scale inflation in order to derive e.g. the number of e-foldings, there is really no need to assume 
this (though reheating to super-GUT temperatures would be problematic). Indeed, studies of the 
expansion rate using supernovae have suggested that the universal expansion is now accelerating; 
it would seem that we are entering another inflationary phase. The ideas described above can 
readily be re-cycled to describe late-time inflation by choosing appropriate parameters (specifically, 
this requires that the fields be very light). The inflaton field must be coupled to other fields in 
order to allow re-heating, and in principle this allows empirical tests of the theory. However, the 
requirements on the form and strength of the interaction are not very specific, and the energies 
required to make GUT-scale inflatons is beyond the reach of terrestrial particle accelerators. Aside 
from the ‘predictions’ of flatness, homogeneity and density fluctuations — all of which were observed 
before inflation was invented — it is hard to find testable predictions. One hope is that the inflaton 
field and its potential will emerge as the low-energy some more fundamental theory which unifies 
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all of the forces, including gravity. This is an area of much activity at present, but hopefully will 
explain why there is an inflaton; why it has the potential it needs; why the minimum of the potential 
is at zero energy density and so on. 


There is another rather unsettling aspect of the inflationary scenario, which is that we had 
to assume that the field is highly homogeneous. Many discussions of the subject argue that any 
inhomogeneity will be stretched to super-horizon scale in order to justify this assumption. This 
seems overly complacent. Recall that in figure (29.11 the boundary of the domain which we can 
describe without a theory of quantum gravity is not a fixed time t = f p i, rather the time at which a 
region starts to be describable classically depends on the size of the region. Each time the universe 
doubles in size, each Planck-scale region gets replaced by eight new Planck-scale volumes. Predicting 
the ‘initial’ state of such regions requires a quantum theory of gravity, but it is commonly imagined 
that the classical universe emerges from some chaotic space-time foam. Now even if this process 
were to generate quite small occupation numbers for these Planck-scale modes, this would give a 
positive contribution to the pressure which would stop inflation taking place. If we want to invoke 
inflation then we must assume that this quantum-gravitational process produce a very rare vacuum. 


There is one other peculiar feature of a potential dominated medium that needs mention. We 
have developed the theory here simply as we did for the FRW models, with the sole modification 
being the adoption of the equation of state P = —pc 2 . There is, however, an important distinction 
to be drawn between such a medium and a fluid with P = pc 3 /3 or P = 0 say. In the latter 
cases there are a preferred set of observers — the ‘comoving observers’ — for whom the stress 
energy tensor takes the symmetric form T 7 "' = diag(pc 2 , P, P, P). At each point in space, this zero 
momentum density condition picks out a unique velocity, and this gives us a unique ‘congruence’ 
of comoving observers. We can clearly determine unambiguously whether the universe is expanding 
or contracting by having such observers exchange light signals and measure red-shifts, for instance. 
In the inflationary phase, in contrast, when P = —pc 2 to high accuracy, there is no such unique 
congruence of comoving observers, since with P = —pc 2 the stress energy tensor has the same form 
in all inertial frames. One can construct a set of test particle world-lines which are exponentially 
expanding, as we have done here, and these observers would say that mass-energy is being created 
spontaneously by the universal expansion. However, one can also find a set of test particles whose 
world-lines are initially converging (the acceleration equation only tells us that a > 0, and one can 
have test particles with a < 0 initially). Such observers would not agree that mass-energy is being 
created. One can also construct a congruence of world-lines which are tilted (i.e. in a state of motion) 
with respect to our comoving observers, and they would also see vanishing momentum density for 
the scalar field. The usual response to this is to argue that the pressure is not precisely P = —pc 2 , 
rather there will be a small correction, either due to the field velocity (fi or due to the presence of 
other matter fields, which will break the exact invariance of T^ under Lorentz boosts. 


Finally, in the introduction, we raised the question ‘what happened before the initial singularity?’. 
In the standard hot big bang, the initial singularity is unavoidable, and such a question is probably 
best answered by saying it is meaningless. With the inflationary equation of state, we have seen that 
the general solution for the expansion factor is the sum of exponentially growing and decaying terms 
(29.41. Generically one would imagine that the coefficients of both terms would be non-zero, in which 
case, at very early times, the negative exponential term would come to dominate and the universe 
would be collapsing rather than expanding. One might therefore claim that inflationary models one 
can have an initially collapsing universe which contracts to a minimum size and then bounces, and 
one could similarly argue for the possibility of repeated cycles of expansion and contraction. In 
the context of the models developed here, however, this is a misconception. There is a quantitative 
difference in the behavior of a scalar field in a contracting universe as compared the the expanding 
model considered above. In the latter case, H is positive and the term 3 H<fi in the equation of 
motion is a friction, and the evolution of the field will relax towards the slowly rolling terminal 
velocity solution. In a collapsing phase H is negative, so we have negative friction. In this case 
the slowly rolling solution — while possible, since the system as a whole is time symmetric — is an 
unstable one. For generic initial conditions going into a ‘big-crunch’ we do not expect the field to 
become potential dominated, and so the inflationary equation of state will not arise. 
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29.5 Problems 

29.5.1 Inflation 

Derive the general form of the solution for the scale factor a(t) in an inflationary universe — i.e. one 
in which P = —pc 2 . 
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Chapter 30 

Observations in FRW Cosmologies 


30.1 Distances in FRW Cosmologies 

The distance to an object is determined by two quantities: the current curvature radius ao, which 
sets the overall scale, and the comoving distance oj. Observationally the more accessible quantities 
are the expansion rate H 0 and the redshift z, the latter being rather accurately measurable from the 
shift of lines in the spectrum. In the next two subsections we show how these two sets of variables 
are related. 


30.1.1 Scale Factor vs Hubble Parameter 


The energy equation tells us that 

H 2 =IttG P ±4 (30.1) 

3 a z 

with the positive sign for open cosmologies and vice versa. With the definition of the critical density 
and the density parameter fi, this provides a connection between the scale factor and the Hubble 
parameter. In what follows, I shall assume an open cosmology (from which the Einstein-de Sitter 
results are obtained as the limiting case). The results for closed cosmologies are almost identical, 
save for appropriate sign changes and replacing hyperbolic trigonometric functions by their regular 
counterparts. 

If we specify the present matter density po and expansion rate Hq, or equivalently the expansion 
rate and density parameter O, then the present value of the scale factor is 


c/H 0 

an - . ~ ■ 


(30.2) 


Note that ao —► oo for S~2o —>• 1; we will return to this presently. 


30.1.2 Redshift vs Comoving Distance 


Now consider the redshift z. As already discussed, the observed wavelength for a photon which left 
a comoving source at time t is equal to the wavelength at emission times the factor by which the 
scale factor has grown. Defining the redshift as 1 + z = \ 0 bs/Km we have 


1 + z = 


ao 

a(t em ) 


(30.3) 


This gives 2 = z(t em ) if the expansion history is known. The comoving distance u> of the source can 
also be related to the time of emission, and hence to the redshift, since 


w = 



(30.4) 
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To make this connection between w and z it is most convenient to work in conformal time rj defined 
such that dr] = cdt/a , and for which dui = —dp. The energy equation, written in terms of a 1 = da/dp 
rather than a = da/dt is 


a = 


87 iGpa A 
3c 2 


(30.5) 


To evolve this backwards in time we need to know how the density p changes with time. This is 
given by the equation of continuity: p = —3 (a/a)(p + P/c 2 ) so p' = dp/dp is 


p' = -3-(p + P/c 2 ). 
a 


(30.6) 


Equations (30.51 and (30.61, together with a specification of the ‘equation of state’ for the matter 


content, can be integrated backwards in time to give a{p) and 77 ( 77 ). Specifically, one might specify 
the current expansion rate H 0 and the current density parameters Cl m , Cl r , Cljy etc for the different 
constituents (which have P = 0, P = pc 2 / 3, P = —pc 2 respectively). The final scale factor is then 
given by (30.2), with f2o = Cl m + ... the total density parameter. Having solved these equations — 


in generally this must be done numerically, though this is quite straightforward - for a{p) = a(—uj) 
(taking 770 = 0 ) gives the redshift z(u>) = ao/a(—u>) — 1 and inverting this gives u>, the comoving 
distance, as a function of redshift. 

While these integrations cannot generally be performed analytically, there are several simple 
and illuminating special cases, and the more general behavior can be qualitatively understood by 
interpolating between these cases. 

First, consider a very low density universe: U<1. In this case we can neglect the density entirely 


and (30.51 says a' = a, or da/a = dr). Integrating this up gives log(do/d) = — 77 , or, with u> = —77 


and using the definition of the redshift, this is u>(z) = log(l + z). As a quick sanity check note that 
for 2 <C 1 this gives u> ~ 2 , whereas do ~ Hq/c for CIq —> 0 and hence the local distance-redshift 
relation is dl = aoui ~ Hqz/c. For high redshift the comoving distance increases without limit in this 
model, but only logarithmically. However, this is somewhat academic, since Cl m is almost certainly 
not much less than 0.05, so the Universe was almost certainly matter dominated at some not too 
distant (i.e. logarithmically recent) time in the past. 

Another case of much interest is the flat Universe, in which case we can ignore the second 
term in the square root sign in (30.51. Consider the case Cl = Cl m = 1: The density is then 
p = po(ao/a) 3 and (30.51 becomes da/^/a = (Hocfi/ 2 /c)dr). Integrating this and dividing by ydo 
gives 2(1 — a/a 0 ) = H 0 aoUj/c. This is somewhat awkward since, as already mentioned, ao —> 00 
for a flat universe. However, in this case, where S\.(cj) = ui, the metric is 


ds 2 — = a 2 (r )(—dip + du > 2 + uj 2 da 2 ). 


(30.7) 


However, this is invariant if we re-scale a —> a' = aa, with a some constant, provided we also re-scale 
comoving distance u> —> a/ = u>/a, and similarly re-scale the conformal time rj. If we choose a such 
that do = 2 c/ H 0 and drop the prime we have 


u> = 1 — 


1 


vT 


(30.8) 


Note that for 2 1 we have oj — 2 /2 so the local distance-redshift relation is dl = do u> ~ Hqz/c 

just as before (the normalization of the present scale factor do = 2c/H q ~ 8000Mpc was chosen for 
this reason). Note also that the comoving distance tends to a finite limit u> = 1 as 2 — 7 00 . This is 
the distance to the horizon. 


Finally, consider the case Cl = Cl a = 1. In this case the density is constant and (30.5) becomes 
a' = H^a 2 /c or da/a 2 = (. Ho/c)di 7 . Integrating this and with ao = 2c/ Hq we have 


= 2 /2. 


(30.9) 


This again has the same local distance-redshift relation dl = Hqz as the other cases and here 
the comoving distance increases without limit as 2 —> 00 reflecting the fact that the A-dominated 
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Figure 30.1: Comoving distance is plotted versus redshift for three illustrative models. 


universe has no horizon. Again, this is somewhat academic since we know that the matter density in 
not negligible. Figure [30.1| shows the comoving distance for three plausible choices of cosmological 
parameters. 


30.2 Angular Diameter and Luminosity Distances 


If we place identical copies of some extended spherical source at various distances then it will have 
an angular size A 9 and apparent luminosity (i.e. bolometric flux density) F which are functions of 
the redshift. For small redshifts the angular size scales inversely with distance D and the luminosity 
scales as 1/D 2 . It is useful to define an angular diameter distance D a (z) such that Ad oc 1/D a (z) 
for all z and and a luminosity distance D\ such that F oc 1 /Df, 


To obtain D a , consider a small spherical object of physical radius A r (figure 30.21. The metric 


(28.35) tells us that the size is related to the angular radius by Ar = a(z) sinho;A0. The angular 
diameter distance is then defined such that A r = D a A0, or 


At 

D a (z) = = a(z)sinhw(z). 


(30.10) 


For low redshift a(z) ~ ao and D a ~ aosinhw, and this increases linearly with redshift. However, 
for z i> 1 the scale factor becomes important. For the Einstein - de Sitter model a oc ?y 2 = (1 — w) 2 
so D a oc (1 — u) 2 oj. This is maximized for u> = 1/3, corresponding to z = 5/4, and for larger redshift 
objects of a given physical size become larger with increasing distance. For the A-dominated model 
uj oc z, while a oc 1/(1 + z) so in this case D a tends asymptotically to a constant value as z —> oo. 

A simple way to establish the luminosity distance D\ is to consider a black-body source with 
rest-frame temperature T em . The intrinsic bolometric luminosity is proportional to the area times 
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Figure 30.2: A source of physical radius Ar lies at a comoving distance to from an observer. According 
to the metric (28.351, the angle and object size are related by Ar = asinhwA0 where a is the scale 
factor at the time the light left the source. The angular diameter distance is therefore D a = 
a(z) sinho>( 2 ). 


O 
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Figure 30.3: Angular diameter distance is plotted versus redshift for three illustrative models. 


the fourth power of the temperature L oc A r 2 T A m (the Stefan-Boltzmann law). The observed 
temperature is lower than T em because of the redshift: T a b s = T em /(1 + z ), and the bolometric flux 
density is proportional to the product of the solid angle AQ times T A hs . But O = 7 tA r 2 /D 2 , so the 
flux density is F oc L(1 + z)~ A /or F oc L/D\ with 

Di{z) = (1 + zfD a {z). 


( 30 . 11 ) 
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Figure 30.4: Luminosity distance is plotted versus redshift for three illustrative models. 


More generally, from measurement of the monochromatic flux density F„ at observer frequency 
v we can infer the luminosity L v ' at rest-frame frequency v' = (1 + z)v as follows: First, note that 
an observer cosmologically close to the source — i.e at a distance r <C c/H — will see brightness 


h' = 


L v ' 


L v > 

(4 7rr 2 )( 7r Ar 2 /r 2 ) 


K’ 

47r 2 Ar 2 


(30.12) 


Now the transformation law for the surface brightness is I v /v 3 = constant, or /„ = I u /(is/is') 3 = 
I v i /(1 + z) 3 . The observed flux density is then 


Ar 

F v = I dCl I v = tt | — ) = 


D r 


L„, 


k'=(1 +z)v 


47rD a (2:) 2 (l + z ) 3 


Turning this around gives 

Note that dv = du'/ (1 + z) so 

F„dv = 

and integrating over all frequency gives 


Lv’ — 47r£) a (l + z) 3 F v ’/(i+ z ). 


L v 'diz' L^dN 

4TrD a (z) 2 (l + z) 4 InDi(z) 2 


F i dvFv 47 tD x {z) 2 


(30.13) 

(30.14) 


(30.15) 


(30.16) 


in accord with the result obtained for a black-body emitter. 
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30.3 Magnitudes and Distance Moduli 


Astronomers like to quote flux density in the magnitude scale defined such that m = —2.51og 10 F + 
constant. Usually, fluxes are measured through some standard filter. The flux in filter band b is 
then 


_ J dv F v R b {v) 
J dv R b {v) 


(30.17) 


where R b (v) is the dimensionless transmission function describing the throughput of the filter. The 
reason for the factor 2.5 is historical, but for rough calculations is is a happy coincidence that 
2.51og 10 is very close to the natural logarithm. Operationally, the magnitude scale is conveniently 
calibrated by measuring the magnitude relative to some standard object. In the Vega magnitude 
scale, the star Vega has magnitude mb identically zero in all passbands. Traditionally, the apparent 
magnitude of an object is then —2.5 times the log of its flux density relative to that of Vega: 


m b = —2.51og 10 


/ dv F(v)R b (v) 

J dv F Vega (v)R b (v)_ 


(30.18) 


For nearby objects, and in the absence of absorption, the flux varies inversely with the square of 
the distance D. One can then quote the distance to an object in magnitudes also. A star identical 
to Vega, but ten times farther away will be 100 times fainter, so it will have an apparent magnitude 
of +5. The distance modulus is defined as 


DM = 51og 10 (U/10pc) 
where one parsec is lpc ~ 3.0 x 10 18 cm. 


(30.19) 


30.4 K-Corrections 

For cosmologically nearby objects, and in the absence of absorption, the absolute magnitude M b in 
some band b is related to distance and apparent magnitude m b by 


M b = m b — DM. 


(30.20) 


For cosmologically distant objects things are more complicated since if we observe in a filter with 
central wavelength A then this receives photons which were emitted around wavelength = A/(1+z), 
so the best we can really hope to obtain is the absolute magnitude in a filter b' with transmission 
function R b >(v) = R b (v( 1 + z )). 

There is an additional, often overlooked, subtlety which is worth mentioning here. Many detectors 
such as CCDs better approximate photon counting systems then they do energy measuring devices. 


A photon counting system cannot measure the integrated energy appearing in (30.181. What they 
can measure is the flux of photons. Since the photon flux n v is equal to the energy flux divided 
by the energy per photon = F v /hv. For very narrow band filters this is a negligible effect, 
but for the broad-band filters often used (because they are more efficient) the distinction can be 
important. We can, however, use the traditional definition of magnitudes, is we simply replace R b (v) 
by R' b {y) = R b [v)/v. We will assume that this substitution has been made, and in what follows we 
will drop the prime for clarity. With F{y)dv = L(v' = (l + z)v)dv'/AnDf(z) the apparent magnitude 
is then 


m b = —2.5 log 


10 


/ lOpc N 

V / dvL{{\ + z)v)R b {v) 


1 f dvLy ega (v)R b (v) 


(30.21) 


Changing integration variable in the upper integral to v’ = (1 + z)v and dropping the prime gives 

J dvL(v)R b (v /(1 + z)) 


m b = DM — 2.5 log 10 


(1 + z) f dvLy ega (v)R b (v) 


(30.22) 
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or as 


m b = DM — 2.5 log 1 


f dvL(is)R b (i/) 


- 2.51og 10 


/ dvL(y)R b (v / (\ + z)) 

[ (1 + z) f dvL(v)R b {y) J 


,ln [/dvL Vega (v)R b {v) \ 

The center term on the right hand side is the absolute magnitude M b , so this is 


M b = m b — DM(z) + k b {z) 


where the so called k-correction is 


kb(z) = —2.51og 10 


/ dvL(v)R b (v/( 1 + z)) 
(1 + z) f dvL(v)R b {v) 


(30.23) 


(30.24) 


(30.25) 


The k-correction depends on the filter response function and on the spectral energy distribution 
(SED) of the source, and accounts for the red-shifting of the source spectrum. 

These correction functions have been computed for various types of galaxies so, if one has a good 
idea of what type of galaxy one is dealing with, one can obtain a rough estimate of the absolute 
luminosity in this way. One can do rather better than this. Imagine one observes a galaxy at redshift 
z = 0.6 in the /-band. The central wavelength is A/ ~ 8000A, so in the rest-frame this corresponds 
to A ~ 5000A which is close to the central wavelength in the E-band. Thus /-band observations of 
galaxies at this redshift provide one with the rest-frame E-band absolute magnitude with very small 
galaxy type dependent corrections. Generalizing this, with / —> b and E —> b' we have 


M b ' = m b — DM(z) + k bb fz ) 


where the generalized k-correction is 

T/ diyL Vega (v)R b fv) 


k bb fz) = —2.51og 10 


L / dvLy ega (iy)R b (iy) J 


2.51og 10 


/ dvL(v)R b (v/{ 1 + z))' 
(1 + z) f dvL{y)R b '{y) _ 


(30.26) 


(30.27) 
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Chapter 31 

Linear Cosmological Perturbation 
Theory 


Having explored the idealized perfectly homogeneous FRW models we now explore departures from 
perfectly uniform density and expansion using linear perturbation theory. We first consider pertur¬ 
bations of the zero pressure models in §31.1 This are applicable at late times when the pressure is 
negligible. These models are also valid at early times for sufficiently long wavelength perturbations 
which are ‘outside the horizon’, and for which pressure gradients are negligible. In §31.2| we consider 
the effects of pressure, which is important at early times for sub-horizon scale perturbations. We 
discuss the different perturbation modes which are present when there are multiple coupled fluids 
(such as the radiation and plasma) and we also discuss diffusive damping. 

Having laid some of the theoretical groundwork we describe in §31.3| how ideas about the nature 
of the matter perturbation have evolved over the years. 

In the following chapter we will discuss the generation of cosmological perturbations. Specifically, 
we consider three spontaneous generation of perturbations; quantum fluctuations in inflation, and 
self ordering fields. 


31.1 Perturbations of Zero-Pressure Models 


31.1.1 The Spherical ‘Top-Hat’ Perturbation 


In a matter dominated background cosmology the simplest way to construct a perturbation is to 
excise a sphere of matter and replace it with a smaller sphere of the same gravitational mass as 
illustrated in figure |31.1 This generates a ‘top-hat’ positive density perturbation. One can lay 


down more than one such perturbation, provided the walls of the perturbations do not overlap, and 
this simple type of model for inhomogeneity is sometimes referred to as the ‘Swiss-Cheese model’. 
While highly idealized, this simple model illustrates most of the features of the perturbations of 
arbitrary shape. 

The trajectories R(t) of comoving observers on the surface of a sphere containing mass M obey 
the energy equation 


R z = 2 GM/R + 2 E 


(31.1) 


with E = constant. The solutions of this equation form a two parameter family; we can perturb 
the energy E, and we can also change the time of the big-bang, or we can make some combined 
perturbation. In either case, since the mass is fixed, we have p(t)R(t) 3 = p'(t)R' (f) 3 , where primed 
and un-primed quantities refer the the interior and exterior respectively. For a small perturbation 
p' = p(l + 5p/p), with Sp/p <C 1, to first order in the amplitude we have 


6p 


P 



(31.2) 
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Figure 31.1: One can generate a perturbation of a dust-filled cosmology by excising a sphere of 
matter and replacing it with a smaller sphere of radius R'. The middle panel illustrates a decaying 
perturbation produced by a ‘delayed bang’. The right panel shows the more interesting growing 
perturbation that can be generated by perturbing the energy of the sphere. The space-time in the 
gap is Schwarzschild. 


where Sp/5p indicates the fractional perturbation to the density at a given time, and similarly for 
SR/R. This gives the density perturbation from the perturbation SR to the trajectory. Now we 
can also determine 5R from the perturbation 6t r in the time t(R) to arrive at a given radius R: 
6t=t'(R ) — t(R) since 6R = —R8t R (see figure 31.1) and therefore 


S l=At R = 3H6t R . 

p R 


(31.3) 


If we make a perturbation by delaying the bang time within the sphere, but keeping the energy 
unaltered, the time delay St R is constant St R = Sti, so this produces a density perturbation with 
Sp/p oc H which says that the density perturbation will decay as l/t. For perturbations generated 
in the early universe, these decaying mode density perturbations will be negligible at late times. 

A more interesting way to perturb the universe is to keep the bang time fixed, but to perturb 
the energy of the explosion. Now the interior will again resemble part of another FRW universe, 
but with lower energy E' = E + SE (the quantity 5E here being negative for a positive density 
perturbation). The energy equation for the perturbed radius is 



2 GM/R! + 2 E + 2 SE. 


(31.4) 


Taking the square root we have, for the time taken to obtain a radius R, 


t’ 



dR’ 

^2Gai + 2E + 25E 


R 


= f — SE 


dR' 


< 2GM 
\ R' 


2 E) 


3/2 


(31.5) 


where, in the second step, we have made a Taylor expansion and where ... denotes terms of 2nd 
order and higher in SE. 

The perturbation in the time to reach radius R is then 


St R = —SE 


dR 1 


' 2GM 
. R' 


2 E) 


3/2 


and from (31.31 the density perturbation is therefore 

— = —35E— J - 

P RJ 


dR' 


3/2 ' 


(31.6) 


o 


(2GM +2jB ) 


(31.7) 
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We can identify two interesting limiting cases 

• At early times in all models (and for all time in the Einstein - cle Sitter model) the energy 
constant E is negligible compared to 2GM/R, the expansion velocity is R ~ a/2 GM/R, the 
integral is J dR(2GM /i?.) -3 / 2 = (2/5)(2GM)~ 3 ^ 2 R 5 ^ 2 so the density perturbation is 


6p 6 SE 
~p ~ ^5 2 GM/R 


(31.8) 


This type of density perturbation therefore grows with time as Sp/p oc R oc t 2 ! 3 oc (1 + z) 1 
This was first shown by Lifshitz. 


• At late times in a low density universe the energy constant term dominates; the expansion 
velocity R —* constant while the integral is E ~ 3 / 2 J dR = E~ 3 RR, with the net result that 
the density perturbation becomes asymptotically constant 


Sp 3 6E 
~p = ~2 ~E 


Cl < 1. 


(31.9) 


We say that the density perturbation ‘freezes out’ in a low density universe as it ‘peels away’ 
from the Cl = 1 solution. For Cl <C 1 the expansion velocity a becomes asymptotically constant, 
so the expansion rate decays as H oc 1/a, and therefore Cl ~ Gp/H 2 oc 1/a also; this means 
that Cl oc (1 + z) in a low density universe, and therefore this freeze-out occurs at a redshift 
1 + 2 ~ 1/flo, though the transition is, in reality, a gradual one. 


Note that we could have easily guessed the growth rate by simply arguing that the perturbation 
to the binding energy of the region is 6<p ~ G5M/R ~ (Sp/p)(pR 2 ) oc ( Sp/p)/R , and is constant. 

The phenomenon described here is often called ‘gravitational instability’, which is something of 
a misnomer. True, the density perturbation grows with time, but the perturbation to the binding 
energy is constant in time; the density contrast grows at just the rate required to maintain this. 

It is also of interest to calculate the perturbation to the expansion rate H = R/R since, as we 
shall see, this is also a directly observable quantity. The expansion rate within the perturbation is 


H' = 


2GM 

R 3 


2 (E + SE) 
R 2 


(31.10) 


so, to first order in SE, the difference between perturbed and un-perturbed expansion rates is 


(6H) r = H'-H = 


5E/R 

v / 2 f i + 2E 


(31.11) 


However, this, as we have been careful to indicate, is the perturbation to the expansion rate at a 
fixed radius R. This is not what is observed, which is the perturbation to the expansion rate at a 
given time. The latter is 


(SH) t = (SH) R + H(St) R 


(31.12) 


Now H = R/R — R 2 /R 2 , which, with R = —GM/R 2 , is H = —(3 GAI/R 3 + 2E/R 2 ) and we have 


SH\ 


SE 


2 GM 
R 


2 E 


„ (3GM \ (2GM nT , 

~ V R + ) V R + 


1/2 


R . 


dR 

< 2GM , oip \ 3 / 2 


R 


2 E) c 


(31.13) 


Again, this rather ugly expression becomes a lot simpler in the limits 2 GM/R E (i.e. Cl ~ 1) and 

2 GAI/R <C E (i.e. Cl <C 1). In the former case we have 


(SH\ _ 2 SE 
\W J t ~ 5 2 GM/R 


Cl ~ 1 . 


(31.14) 
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and in the latter limit the fractional perturbation to the expansion rate vanishes. 

Comparing with (31.81 for the amplitude of the density perturbation we see that for for growing 
mode perturbations in the Einstein - de Sitter background the perturbation to the Hubble rate is 
just minus one third of the fractional density perturbation: 


SH _ 1 Sp 

IT ~ - s7' 


(31.15) 


The fractional perturbation in the expansion rate therefore also grows as SH/H oc t 2 / 3 . The velocity 
difference between a pair or particles which straddle the gap is, in this case, 


-6v = HSR - RSH = HR 





(31.16) 


Since HR oc y/l/R, the peculiar velocity grows as 5v oc y r R oc t 1 / 3 . Again, this growth rate can 
be obtained by arguing that the potential 8<j> is constant in time, but 6<j> ~ S(v 2 ) ~ vSv hence 
Sv oc 1/v oc y/~R. 

To summarize, we have shown that there are two modes for spherical density perturbations in 
a zero-pressure cosmology. The decaying modes have Sp/p oc 1/t. The more interesting growing 
modes have fractional perturbations in the density and expansion rate which grow as t 2 / 3 for fi ~ 1. 
In low density models the density perturbations become asymptotically constant, and the expansion 
rate perturbation asymptotically vanishes. 

The above analysis is Newtonian. This is adequate, provided we are considering perturbations 
at such times that R <C c. That is, for perturbations which are smaller than the current horizon 
scale. This way of modeling perturbations is, however, not restricted to this regime. In the case 
of perturbations to a closed universe, for instance, the embedding diagram for the type of density 
perturbation is partial closed model within the perturbation matching on to a shell of horn-like 
Schwarzschild geometry which then matches smoothly onto the exterior, this being a closed model 
with larger radius of curvature. 


31.1.2 General Perturbations 


The spherical perturbations considered above are highly idealized. In general the density p(x, t) 
is some arbitrary function of physical position x. The expansion velocity v p hy S = x will also, in 
general, be some related function of x and t. In what follows it is more convenient to work in 
comoving spatial coordinate r = x/a(t), where a(t) is the global scale factor (i.e. that for the un¬ 
perturbed cosmology). This is the same comoving coordinate we denoted by u) earlier. We define 
the density perturbation 


s ,_ ^ 6p(r,t) p{ar,t)-p(t) 

f|MI= ir = m 


(31.17) 


where p is the mean density. It is also more convenient to define a peculiar velocity 


v = Vp hys - -Hx, 


(31.18) 


which is the departure from uniform expansion, and to define a comoving peculiar velocity u, which 
is the rate of change of r with time: 


u = r = 


d(x/a) x ax v p h ys - ffx v 
dt a a 2 a a 


(31.19) 


Our experience with spherical perturbations suggests that we should be able to decompose a 
general initial perturbation into growing and decaying components 


S(r,L) = S + (r,ti) + S (r,tj). 


(31.20) 
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where the split into growing and decaying modes is determined by the initial perturbation S and 
its rate of change 8. In the linear approximation, each of these modes then evolves completely 
independently, with growth factors D ± (t) = 8 ± (t)/8 ± (ti) and the final density perturbation is 

S(r,t) = D + (t)S + (r,ti) + D~(t)6~(r, ti). (31.21) 

This is not quite true, however. It turns out that in two or more dimensions there are modes 
of a completely different character, which did not appear in our analysis of spherical perturbations 
since they do not admit any spherically symmetric states. To see why, note that, at a given time, 
we can make a Fourier transform and write the density perturbation field as a Fourier synthesis 

<5(r,t) = ^4(f)e ikr . (31.22) 

k 

That is, we are writing the density field as a sum of plane wave ripples with a pattern which is fixed 
in comoving coordinates (we are assuming here that the wavelength of the perturbation is much 
less than the curvature scale, so we can consider our spatial coordinates r to be flat-space Cartesian 
coordinates). If we do the same thing to the velocity field 

u(r, *) = u k( < )e !k r (31.23) 

k 

then it is clear that, for each wave-vector, there are four degrees of freedom — the values of (5k and Uk 

not two. The spherical perturbations are special in that if we make a Fourier decomposition, the 
velocity coefficients Uk are parallel to the wave-vector k. These are like longitudinal or compressional 
sound waves. This is also the characteristic of potential flows. In addition to these so-called scalar 
modes (not to be confused with scalar fields), there are vector modes. These have a quite different 
character, and have non-zero transverse velocity (i.e. the velocity in the directions orthogonal to 
the wave vector). These modes, for example, can have non-vanishing angular momentum, and are 
often referred to as ‘torsional modes’. Henceforth we will simply ignore these vector modes, with 
the rather lame justification that they are uninteresting since there are no growing vector modes. 
That is, we will exclusively consider perturbations for which 

u(r,f) = ^ku k (t)e lkr . (31.24) 

k 

The first step in deriving the growth rate for density perturbations is to note that the density 
perturbation will give rise to a small perturbation Scj) to the Newtonian gravitational potential via 
Poisson’s equation 

V 2 J(t> = AirGpS. (31.25) 

We use the subscript x here to show that we are taking the gradient with respect to physical 
coordinate x, rather than say comoving coordinate r. Applied to a plane wave density ripple 8<p = 
0kexp(*k • r) = 0k exp(i(k/a) • x) the Laplacian operator gives V 2 6<j> = —k 2 ^/a 2 and therefore 
Poisson’s equation becomes the algebraic equation 

k 2 4>]i = —47rGpa 2 (5k- (31.26) 

The gradient of the potential perturbation Sp is the so-called peculiar gravity and causes particles’ 
world-lines to deviate from r = constant. However, as we will work in comoving coordinates, there 
is a slight subtlety: Imagine a particle moving ballistically with no gravitational forces acting. The 
particle’s physical velocity Vphys will be constant, but as it moves it will be overtaking particles with 
progressively higher Hubble velocity, so its peculiar velocity, i.e. its velocity relative to the comoving 
observers it is passing, will decrease in time. If, at some instant, it is moving at velocity v past 
observer A then after a time interval 5t it will have moved a distance <5x = vSt, and will be passing 
observer B who is receding from A with Hubble velocity HSx, so B will see the particle passing with 
velocity v' = v — HvSt; this implies the equation of motion (in the absence of any peculiar gravity) 


v = -Hv. 


(31.27) 
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Thus, in an expanding coordinate system there appears to be a cosmic drag acting on the particle. 
The effect of the density perturbation is to subject the particle to an additional acceleration — V x <5(/> 
and the equation of motion is therefore 


v = — Hv — V x Scf 5. 

In terms of u = u> = v/a, for which u = (v — Hv)/a, 

u = —2Hu - \v r 5(j) 


(31.28) 


(31.29) 


where V r = aVj, denotes the derivative with respect to comoving coor dinate r. For a plane-wave 
density ripple, for which V r <5</> = *k(/>i c e lk ' r and the velocity is given by (31.241, this gives 


ilk = — 2Hu^ — ik<j) k/o 


(31.30) 


The final step is to invoke the continuity equation to connect the velocity u and the rate of 
change of <5. In x, t coordinates the continuity equation is dp/dt = — V*- (/OV p h ys ). The density 1 + 5 
in comoving coordinates similarly satisfies the continuity equation 9(1 + S)/dt = — V r • ((1 + 5)u), 
the linearized version of which 

6 = —V r • u. 


For a single plane-wave density ripple this is 

<5k = — ifcuk- 

Taking the time derivative of ( 31.32) , and using ( 31. 30) we obtain 

(5k = —ikiik = ik(2Huk + ikip^/a 2 ) = —H5k — (k 2 /a 2 )<j) k 


and using (31.26) to eliminate (/>k we have 

«5 k + 2£f(5 k - 47rGp<5k = 0. 


(31.31) 

(31.32) 

(31.33) 

(31.34) 


This is the equation governing the time evolution of a linearized density ripple of amplitude (5k- 
However, since k does not appear in any of the coefficients, and 5k appears only at first order, we 
can multiply by e* k r and sum over modes to obtain 


<5 + 2HS - 4nGpS = 0. 


(31.35) 


This is a local equation governing the evolution of the density perturbation field 5(r,t). It is second 
order in time, so we need to specify both S (or the displacement) and 5 (or the rate of change of the 
displacement) as initial conditions. 

One can readily show that for fi = 1, equation ( 31.35) admits the same growing and decaying 
solutions obtained from the spherical model. Making the ansatz <5(r,t) = AF(r )t a , we have S = 
aAF(r)t a ~ 1 = ad/t and 5 = a (a — l)AF{r)t a ~ 2 = a(a — 1 )6/t 2 which, in (31.35), and dividing by 
5, becomes 

a(a - 1)/ 1 2 + 2aH/t - InGp = 0. (31.36) 

Now in the Einstein - de Sitter model a oc t 2//3 so H = a/a = 2/3 1 while 47 rGp = 3H 2 /2 = 2/3 1 2 so, 
multiplying the above equation by t 2 , we have 


3a + a — 2 — 0 


(31.37) 


with solutions a = —1, +2/3. The ‘e igen-modes’ of (31.351 are therefore 5 + (r,t) oc f 2 / 3 and 6 (r,t) OC 
t~ x . The general solution of (31.35) is then 


<5(r,i) = S + (r,ti) 


2/3 


+ 5 (r ,ti) 


-1 


(31.38) 
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Here ti is the initial time. 

Taking the time derivative and multiplying by t gives 

tS(r,t) = ^6 + (r,ti) (j-'j - S~(r,ti) (j^j 

and solving the above equations for 5 + (r,ti) and 5~(r,ti) yields 

<5+(r, U) = | (<5(r, U) + tS( r, t*)) 

MuU) = | (5(r,ti) - §t<S(r,ti)) 

Given some initial <5(r), <j(r), this tells us the amplitudes for the growing and decaying modes. 

31.2 Non-zero Pressure and the Jeans Length 

So far we have considered P = 0; this is a good approximation when the Universe is matter dominated 
and when the gas is decoupled from the radiation. However, prior to a redshift of about 1000 
known as the redshift of recombination or the redshift of decoupling — the gas was highly ionized, 
and was tightly coupled to the photons of the microwave background by Thomson scattering. Prior 
to decoupling it is important to allow for the effect of the pressure of the radiation. 

31.2.1 Matter Dominated Era 

We will first consider the behavior of perturbations in the relatively narrow window between z eq 
and Z(j e c- During this period, the density of the radiation-plasma fluid is dominated by the baryons, 
p ~ pb 3 s Prad, but the radiation provides a pressure P = P ra d = p ra dC 2 /3. 

Assuming the baryons and photons to be tightly coupled, the radiation density is proportional 
to the 4/3 power of the matter density, or 


(31.39) 


(31.40) 


Prad — 


_ Prad 4/3 


—4/3 

Pb 


p b 


(31.41) 


where p rac j, p h are the mean densities of radiation and baryons. The sound speed is given by 


c 2 = — = c 2 dPra-d = 4c 2 p rad 
s dp 3 dpb 9 Pb 


(31.42) 


The sound speed is therefore on the order c s ~ C\J p ra d/Pb oc aT 1 ! 2 . We can define a sound horizon 
to be the distance that sound waves can travel in the age of the Universe, or in one expansion time. 
The comoving sound horizon is r ~ c s t/a. However, in the matter dominated era, t oc a 3 / 2 and 
hence the comoving sound horizon is constant. 

The Fourier decomposition approach (unlike the spherical top-hat) is readily modified to incorpo¬ 
rate pressure: For the gravitational acceleration — V x <5</> must be augmented by the pressure 

gradient acceleration —V x P/p. which, for linear perturbations, is —c 2 s W x 5. Including this extra 
acceleration in (31.331, equation (31.341 becomes 


5k + 2H5k — (JirGp — c 2 k 2 /a~)5k = 0. 


(31.43) 


The new extra term here radically changes the behavior of the solutions. For small wavelength (high 
k) this term will dominate and the solutions will be oscillatory — these are simply adiabatic sound 
waves (though the pressure here comes from the radiation rather than the kinetic motion of the gas 
particles as in a conventional sound wave). 

One can define the Jeans wavelength 

- 1/2 


A j = 2na/kj = 27rc s (47rGp) 


(31.44) 







344 


CHAPTER 31. LINEAR COSMOLOGICAL PERTURBATION THEORY 


which separates the growing (A A j) modes from oscillatory (A -C Aj) solutions. To order of 

magnitude, the Jeans length is just the distance a sound wave can propagate in the age of the 
universe (A j ~ c s t) which is just the physical sound horizon. 

Perturbations of wavelength <C A j will oscillate many times per expansion time (this assumes 
that diffusive damping is negligible; this will be considered later). However, the equation that these 
perturbations obey is not a simple harmonic oscillator, as it contains a damping term 2H5 and also 
the frequency of the oscillations w ~ c s k/a is not constant. To determine the secular evolution of 
the perturbations — i.e. how the amplitude evolves with time — we proceed as follows: First we 
make a transformation from S to an auxiliary field x defined such that 


<5(r,t) = x(r ,t)t° 


with a constant. The partial time derivatives of 5 appearing in (31.43) are then 

6 = Xt a + a X t “- 1 


S = X t a + 2 a X t a 1 + a{a — 1 ) X t‘ 


j.a-2 


so, on multiplying by t 2 (31.431 becomes 


X + 2 (a/t + H)x + 


c 2 s k 2 3 2 2Haa(a~ 1) 


- -FT 


t t 2 


X = o. 


(31.45) 


(31.46) 

(31.47) 


(31.48) 


Now in the matter dominated era a oc t 2 / 3 , so H = a/a = 2/3 t, so, if we take a 
coefficient of X vanishes and we thereby eliminate the damping term. We then have 


X + 




X = 0. 


—2/3, the 

(31.49) 


This is an un-damped oscillator X + SI 2 X = 0 with time varying frequency f I 2 = c 2 k 2 /a 2 — H 2 . 
We are here most interested in waves with X c s t — since longer wavelengths will not have had 
time to oscillate - but this condition is k = 2i:a/X Ha/c s . To a good approximation we can 
neglect the term H 2 in the frequency, so fl ~ c s k/a. The sound speed is c s ~ Ci/p ra d/Pb which 
is proportional to a -1 / 2 , so 12 oc a -3 / 2 . The frequency decreases as the Universe expands, partly 
because the wave, having fixed comoving wavelength, is being stretched, and partly because the 
sound speed is decreasing. 

In the limit that the time-scale for variation of the frequency is much greater than the period of 
oscillation — which is just the condition k Ha/c s again — we can apply the principle of adiabatic 
invariance, which tells us that the amplitude of the fluctuations should scale as x cx l/v^2 

or as x cx a 3 / 4 . The auxiliary field fluctuations therefore grow in amplitude. However, the actual 
density fluctuation S = x^ a = X N 2 ^' i oc x/ a i with the net result that the amplitude of acoustic 
fluctuations damp adiabatically as 


X oc a oc (1 + z) 1 / 4 . (31.50) 

Note that in obtaining this we assumed a oc t 2 / 3 , as is appropriate for a matter dominated 

cosmology. However, as the result was obtained using adiabatic invariance, it is independent of 

the details of the expansion, and would still apply if, for instance, there were some additional field 

present and affecting the expansion. 

31.2.2 Radiation Dominated Era 

The above Newtonian analysis is only valid for P<p, and for perturbations which are smaller than 
the horizon. When the former condition breaks down, one cannot use the non-relativistic Euler, 
energy equations, and when the latter is broken one cannot use Newtonian gravity. Here we will 
consider the evolution of acoustic perturbations on scales much less than the horizon, or equivalently 
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much less than the sound horizon A -C c s t, since, in the radiation era, the sound horizon c s t = ct/\/3 
tracks the light horizon, and we will see how the adiabatic damping of such waves is modified. 

Now for waves with A -C c s t we can safely neglect the gravity due to the perturbation since 
the pressure gradient acceleration is overwhelming larger. What we shall do here is to compute 
the evolution of such waves in a fictitious background cosmology in which gravity is also neglected. 
We do not believe that this is really valid; it is logically possible to have a radiation dominated 
cosmology with 0 -C 1, but this appears not to be the case for our Universe. However, since the 
waves evolve adiabatically, the evolution of the wave amplitude, as a function of the scale factor, is 
independent of the details of the expansion law. By this ruse we are able to compute the evolution 
using only special relativity and we are able to sidestep the complications of coupling a relativistic 
plasma to gravity. 

The equations of motion are then simply 


= 0. (31.51) 

When we considered self-interacting scalar elasticity sound waves we showed how these four equations 
could, in the ideal fluid limit, be cast into a more useful form as evolutionary equations for the energy 
density and for a 3-velocity v. Here we will repeat that analysis, but using the conventional notation 
for cosmological density fields. The stress-energy tensor for an ideal fluid is 

= (p + P/ ?)U^U V + tf v P (31.52) 


where the 4-velocity U = (yc, yv) is that of observers who perceive a stress-energy tensor = 
diag(c 2 p, P, P, P); i.e. for observers comoving with the fluid. 

If we set fi = i in (31.51) and make use of (31.511 with p = 0 we obtain the relativistic Euler 
equation 


<9v 1 

-b fv • V)v =- 

8t y ’ y 2 (c 2 p+P) 


' „ dP 

c“VP +v — 
dt 


Dotting (31.51) with U gives 


d 


dP 


0 = U^, v = U,—[(p + P/c 2 )U^] + U"— 

which, with U = (yc, yv) and using d(U ■ U)/dt = 0 gives the relativistic energy equation 

Pc 2 


dp p- 

_ + (v .V)p = — 


7 


g + V-hv) 


(31.53) 


(31.54) 


(31.55) 


Specializing to the case of a radiation density dominated plasma for which P = pc 2 /3, (31.531 and 
(31.55) become 


dv 

dt 

dp 


1 


y 2 4p 
4 


dt 3^ 


2rj dp 
c V„ + v - 

V -v 


1 dy 
_y dt 


(31.56) 

(31.57) 


where d/dt = d/dt + v • V is the total, or convective, time derivative. 

Equations (31.561, (31.571 admit exact solutions corresponding to a homogeneous expanding 
Universe with un-decelerated linear Hubble expansion v = Hx = x/t. Such solutions do not have 
p(r,t) = p(t ), as one might perhaps have expected. This is because in a homogeneous expanding 
Universe, the density is not constant on slices of constant coordinate time t , rather it is constant on 
surfaces of constant proper time since the big bang. For freely expanding matter in Minkowski space- 
time, the proper time is r = \Jt? — x 2 /c 2 = t/ y. Surfaces of constant r are therefore hyperbolae in 
x, t space. If we look for a solution with 


p(x,t) = p(t) 


(31.58) 
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we find that the factor in brackets on the right hand side of the Euler equation is 


c Vp + v— = 


dp dp (vt — x) 


dt dr 


(31.59) 


but x = vt, so the right hand side of (31.561 vanishes. Therefore dv/dt = 0, and so fluid elements 
do indeed move with constant velocities. If dv/dt = 0 then d'y/dt = 0 also, and therefore the energy 
equation (31.57) becomes, in this context, 


dp 4 p 

-L = --pV • v = -4- 
dt & t 


(31.60) 


where we have used V ■ v = (l/t)V • x = 3/t, or, since t = 7 r, and 7 is constant along any fluid 
element world-line 

^ = —4— (31.61) 

dr r 

which does indeed allow a solution with p = p(r) , specifically ~p oc 1 /r 4 . Since this model is freely 
expanding (i.e. the scale-factor a is proportional to r) this corresponds to the usual p oc 1 /a 4 behavior 
for relativistic matter. 

Having established the exact homogeneous expanding solution, we now want to look at the 
evolution of perturbations about this ‘background’. In the above analysis we worked in Minkowski 
space coordinates (x, ct), but this proves cumbersome when we add perturbations. Instead we will 
consider the density and velocity to be functions not of x, t but of the dimensionless comoving 
spatial coordinate r = x/cf and the proper time r. We can readily infer the equations governing 


the evolution of p(r,r) and v(r,r) from equations (31.56), (31.57). Consider a point in space-time 


where the velocity of the fluid vanishes. At that point, intervals of coordinate time t and proper 
time r are identical, 7 = 1 , and so these equations become 


dv 

dr 

dp 

dr 


= -fv P 

4 P 

4 _ 

- 3 pV.v 


(31.62) 

(31.63) 


(note that vanishing of v does not imply that V • v = 0). But we can always make a Lorentz trans¬ 
formation to make the velocity at any point vanish, and therefore equations (31.621, (31.63) apply 


everywhere, with understanding that V denotes the gradient with respect to physical displacement 
in the rest-frame of the fluid — i.e. it is the spatial gradient on surfaces of constant proper time r. 
It is easy to check that the zeroth order solutions 


p = Po OC 1/f 4 
v = v 0 = x/i = cr 


(31.64) 

(31.65) 


satisfy these equations. Let us now look for solutions p = po + pi and v = vq + Vi. Substituting 


these in (31.621, (31.631 and discarding all but the first order terms yields 

c 


vi = - 


4 Pqt 


V r pi 


Pi = 


4po, 

3 CT 


4 P i y7 

' Vi - V r 

3 CT 


■ V 0 


(31.66) 

(31.67) 


where V r = aV denotes spatial derivative with respect to comoving coordinate r and dot denotes 


derivative with respect to r. Now V • vq = 3c so (31.671 can be written as 


Pi 


pi 4 po _ 

4— = —V r vi. 

r 3 cr 


(31.68) 


Taking the derivative of this with respect to proper time gives 


Pi = 


d(po/ T )/dT f 4 po 


Po/t 


3 CT 


• Vi 


4 Po 
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v r • V! - 4 


Pi 
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(31.69) 
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Now p 0 oc 1/r 4 so d(p 0 /T)/d,T = —5 (p 0 /r), and using equation (31.681 to eliminate V r • V! and 


(31.66) for vi yields a second order equation for pp. 


pi + 9pi/r+ 16pi/r 2 - ^V 2 pi = 0. 


(31.70) 


However, for the waves of interest here, for which A -C ct, the last term is much larger in magnitude 
than the penultimate term and we therefore have, in this limit, 


Pi + 9pi/r - 


3r 2 


V 2 Pl = 0. 


(31.71) 


We can now treat this exactly as we did for waves in the matter dominated era: letting pi = \t a , 
but now with a = —9/2, eliminates the damping term and results in the oscillator equation 


x - 


3r 2 


Kx = o 


(31.72) 


with frequency f 1 = 1 /x/3r, so adiabatic invariance tells us that that the y-field fluctuations evolve 
as y oc 1 /vH oc t 1 / 2 and therefore the amplitude of the density perturbation evolves as p\ oc 1/r 4 . 
Thus, in this model, the density perturbation amplitude evolves exactly as the background density 
and the fractional density perturbation amplitude is constant: 

S — —— oc r°. (31.73) 

Po 


31.2.3 Super-Horizon Scale Perturbations 

The rigorous treatment of perturbations with A i> ct is quite involved, and there are some sub¬ 
tleties involved in defining the perturbation amplitude. In the Newtonian analysis we define the 
perturbation to be the variation in the density at a given time. For super-horizon scale waves one 
needs to carefully define the hyper-surface on which one defines the density contrast. One could, 
for example, take these hyper-surfaces to be surfaces of constant density, in which case the density 
perturbation would vanish identically. This does not mean that super-horizon scale perturbations 
are ill-defined, since in this case the expansion law would not be uniform. The results of the more 
detailed analysis, which we shall not cover here, are in accord with the picture which emerges from 
the spherical model that an over-dense region can be thought of as part of a different universe 
(i.e. one with a smaller radius of curvature) evolving independently. These perturbations have con¬ 
stant spatial curvature or gravitational potential, provided A c s t. In the matter dominated case 
we had 5(f> = constant ~ SM/R ~ ( R 3 Sp)/R ~ (Sp/p)pR 2 . Constancy of 5(j) and p oc 1/R 3 then 
implies 5p/p (x R cc a. In the radiation dominated case, the same hand-waving argument, but now 
with p oc 1/1? , would suggest that Sp/p oc a 2 , which is in agreement with the more sophisticated 
analysis that can be found in Peebles’ book for instance. 


31.2.4 Isocurvature vs Isentropic Perturbations 


The type of perturbation we have described above is an adiabatic perturbation. This is the pertur¬ 
bation one makes if one takes the matter and radiation in some region and compresses it; the ratio 
of photons to baryons is fixed and consequently <5 rac j = (4/3) <5b and there is a non-zero perturbation 
in the total matter density 



Prad^rad + Pb^b 


(31.74) 


Prad ' Pb 


as illustrated in the left hand panel of figure |31.2 

An alternative that used to be considered is a isothermal perturbation where the radiation is 
unperturbed initially. Nowadays we think of multi-component fluids comprising e.g. pressure free cold 
dark matter, plus a tightly coupled plasma composed of baryons and radiation. A similar ambiguity 
then arises as how to set up the initial perturbation. One very natural type of perturbation is to 
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Figure 31.2: The left hand panel shows an ‘adiabatic’ or ‘isentropic’ perturbation of the kind we 
have been discussing above. In such a perturbation we crush the matter and radiation together. 
Such perturbations have a net density inhomogeneity and consequently have non-zero curvature or 
potential perturbations. A very natural alternative is to generate a perturbation in which the initial 
perturbation in the baryon density is cancelled by a corresponding under-density in the radiation. 
Such perturbations have, initially, no net density perturbation and therefore no associated curvature 
perturbation, and are called ‘isocurvature’. For super-horizon scale perturbations the curvature is 
frozen in, but there is a non-zero pressure gradient, and once the perturbations enter the horizon 
this becomes effective and will act to annul the pressure gradient. In the example shown in the 
center panel, there is an inward directed pressure gradient which will act to erase the under-density 
in radiation, but in doing so will enhance the over-density in the baryons. The radiation density 
will over-shoot and one will have an oscillation about a state in which the radiation is uniform. The 
equilibrium state about which these oscillations will occur is shown in the right panel and is known 
as a ‘isothermal’ perturbation, since the radiation density, and therefore also the temperature, are 
constant. 


say that initially there were no curvature fluctuations, and the perturbation is produced by varying 
the relative fractions of baryons to photons with the excess in any one fluid being compensated by a 
deficit in the other. Such a perturbation is termed an isocurvature perturbation and is the descendant 
of the old style isothermal perturbation. The alternative is the so-called isentropic perturbation in 
which the fluids are compressed together and which is the descendant of the old-style adiabatic 
perturbation. The terminology here arises because the number of photons per baryon is equivalently 
the entropy per baryon. 


31.2.5 Diffusive Damping and Free-Streaming 


The Jeans analysis assumed a tightly coupled baryon-photon plasma. There are two situations where 
this is inappropriate. 

At early times the plasma is very optically thick, but as the universe expands the mean-free path 
for photons increases and the photons tend to leak out of the sound waves and consequently they 
damp out. This was first analyzed by Joe Silk and is called Silk damping. Around z = 1000, the ions 
and electrons (re)combine and become neutral and the photons are no longer locked to the baryons. 
The consequence of this is that sufficiently small scale perturbations (interestingly close to the mass 
of galaxies as it happens) are damped out. Nowadays the emphasis is on dark matter models where 
the damping of acoustic fluctuations is not so critical. 

Another interesting (and possibly very relevant) situation is that of hot-dark-matter such as a 
neutrino with a mass on the order of 10 eV. Such particles have large thermal velocities, but are not 
locked to the other components by scattering. One can still define a neutrino Jeans length which is 
the distance a neutrino can travel in one expansion time. This starts off small (in comoving terms), 
grows and peaks around the time the neutrinos go non-relativistic. Once the neutrinos become 


non-relativistic their velocities decay adiabatically as w oc 1/a according to equation (31.271 and 
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the comoving Jeans length decreases. As with acoustic waves, perturbations bigger than the Jeans 
length grow, but smaller perturbations damp out very rapidly as the neutrinos stream out of the 
perturbation. 


31.3 Scenarios 

We will now put these results to work and explore a number of scenarios for structure formation. 

31.3.1 The Adiabatic-Baryonic Model 

The first scenario to be worked out in any detail was the adiabatic baryonic model (Peebles and 
Yu). In this calculation one assumes some initial adiabatic density fluctuations imposed at some very 
early epoch, and one assumes that the universe is dominated by baryons and radiation. One Fourier 
decomposes these into plane wave modes and then calculates the temporal evolution of each mode 
separately. At sufficiently early times all the modes have A ~ a/fc 3> t, pressure gradients are neg¬ 
ligible and the perturbations grow in step with S oc a 2 . Eventually, short wavelength perturbations 
enter the horizon. For galaxy scale perturbations (Ao ~ 1 Mpc) this happens when the universe is 
radiation dominated. These perturbations then oscillate as acoustic waves with constant amplitude. 
As time goes on progressively longer wavelengths enter the horizon and start oscillating. Eventually 
we reach the epoch z eq when p ra( j = p ma tter and then something rather interesting happens. Before 
z eq , P ~ p/3, c s ~ l/\/3 and the ‘sound horizon’ ~ c s t/a grows like a oc t 1 / 2 . After z eq , P/p oc 1/a, 
the sound speed falls like 1 / \fa 1 t oc a 3 / 2 and consequently the comoving sound horizon (or comov¬ 
ing Jeans length) is approximately constant. At z ~ 1000 — z eq / 10, the plasma recombines and 
the pressure support from the radiation is lost, the Jeans length falls, and all perturbations grow 
according to the usual 5 oc a law. The various phases of the life of an adiabatic perturbation are 
shown schematically in figure [3173) 

The result of this calculation can be expressed in terms of a transfer function T(k) expressing 
the final amplitude of a perturbation of (comoving) wavenumber k relative to it’s initial amplitude. 
For a power-law initial power spectrum P{k) oc k n for instance, the final power spectrum is P(k) oc 
k n T 2 (k). This linear output spectrum can then be given as initial conditions to a numerical simulator 
who can evolve the density fluctuations into the non-linear regime and compare the results with 
e.g. galaxy clustering data. The transfer function for the adiabatic baryonic model has some rather 
interesting features. For wavelengths longer than the maximum Jeans length (roughly the horizon 
size at z eq ) T(fc) is constant, but for shorter wavelengths there are oscillations. These arise because 
one assumes that the perturbations are purely in the growing mode at ti , so the phase of the sound 
wave tp = J u>(t)dt at z^ ec is smooth and monotonically increasing function of k. If one models the 
decoupling process as instantaneous, it is easy to show that the amplitude of the growing mode 
perturbation is then an oscillatory function of k. This results in nodes and anti-nodes (see figure 
31.4) and therefore quasi-periodic bumps in the power spectrum. These oscillations extend to about 


a factor 10 in wavelength, below which they are damped out by photon diffusion. There is a large 
bump in T(k) at around the maximum Jeans length. This comes about because of the plateau in 
the Jeans length; a mode with wavelength just greater than Aj max will grow uninterrupted while one 
of slightly shorter wavelength will oscillate with actually slowly decreasing amplitude between z eq 
and z lec . 

This calculation really marked a turning point in structure formation. While the creation of the 
initial fluctuations was still a subject of speculation, it seemed reasonable to assume a power law 
initial state, and one then obtained a quantitative prediction for the post-z eq power spectrum. This 
model has only two free parameters; the initial amplitude of the power spectrum and the initial 
spectral index n. Moreover, this spectrum had two prominent features; the bump at u>He q and the 
damping cut-off. The present horizon size is ~ 1/Hq ~ 3000 Mpc, and since wh oc t/a oc a 1 / 2 oc 
z_i/ 2 , ao»He q — 30 Mpc which is the scale of superclusters, the largest prominent structures we 
observe in the universe; a remarkable fact indeed! The damping scale sets a minimum scale for 
the first structures to form, and it is obviously attractive to identify this scale with galaxies in this 
theory. 
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^eq ^dec time 


Figure 31.3: Evolution of initially adiabatic (or isentropic) perturbations is shown schematically for 
perturbations of three different wavelengths. The perturbation passes through three phases. First, 
when outside the horizon, the perturbation amplitude grows as S oc a 2 in the radiation era and as 
5 oc a in the matter dominated era. Perturbations which enter the horizon before t eq oscillate at 
constant amplitude until t eq . For t eq < t < tdec the amplitude decays adiabatically as 5 oc 1/a 1 / 4 . 
Short wavelength perturbations are, in addition, subject to diffusive damping, and are strongly 
attenuated. Perturbations which persist to tdec then couple to growing and decaying perturbations 
in the now pressure-free neutral gas. 

k 3 P(k ) 


log(fc) 

Figure 31.4: Power spectrum in the adiabatic-baryonic model (schematic). The dashed line indicates 
the initial power spectrum. The main peak is at a scale just larger than the maximum Jeans length, 
where the perturbations underwent continuous growth. Shorter waves entered the horizon before z eq 
and subsequently oscillated, so their amplitude is suppressed. The nodes in the output spectrum are 
those wavelengths which have zero amplitude in the growing mode at the time of decoupling. The 
cut-off in the power spectrum at high k is due to diffusive damping. 
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This theory therefore has much to commend it. Its main weakness is that is does not include dark 
matter. Dark matter seems to dominate over baryonic matter in clusters of galaxies — least-ways it 
is hard to see how the dark matter there could be in a baryonic form — and big-bang nucleosynthesis 
predicts an uncomfortably low baryonic density parameter: fib — 0.02 /h 2 ~ 0.04. Such a low density 
means that perturbations freeze out quite early (when O starts to peel away from unity) and this 
makes it hard to account for both the smallness of the microwave background fluctuations and the 
presence of galaxies etc. at z 3. 

As well as the empirical evidence for substantial quantities of dark matter, which implied a 
density parameter O ~ 0.2, and possibly more if galaxies are biased with respect to dark matter, 
there was the natural repugnance on the part of theorists to the idea that O be quite close to unity, 
but not exactly so. These facts and prejudices led to the development of models containing large 
amounts of dark-matter. 

31.3.2 The Hot-Dark-Matter Model 

The first non-baryonic model to be seriously studied was the hot dark matter model (HDM), with 
the candidate particle being a massive neutrino. Neutrinos are produced in thermal abundance at 
high temperatures, but the weak interactions freeze-out around an MeV or so and the neutrino are 
thereafter effectively uncoupled from the rest of matter save through their gravitational influence. 
The expected number density of a light neutrino species is then roughly the same as the number 
density of photons in the microwave ground; this is only a rough equality because the number of 
photons was boosted somewhat when the electrons and positrons recombined, and from consideration 
of degrees of freedom. Thus if the neutrinos have a mass such that they would go non-relativistic at 
around z eq they would have a density comparable to the critical density today. This requires m ~ 30 
eV, and this theory got a substantial boost when there was experimental evidence to suggest a mass 
of this order (Lyubimov). 

In the HDM model, fluctuations on small scales are damped out by free-streaming. The char¬ 
acteristic smoothing length in this theory is just the comoving distance traveled by a neutrino. At 
early times, 2 > z eq , the neutrinos are relativistic and the comoving distance traveled is just the 
horizon size wjj , after they go non-relativistic the velocity of the neutrinos redshifts v oc 1/a, so the 
comoving distance traveled per expansion time is ex vt/a <x t/a 2 <x a~ 1 / 2 which decreases with time. 
Thus, the total distance traveled is on the order of the horizon size at z eq , which, as we have seen, 
is roughly the scale of superclusters today. 

As there is essentially no power remaining on small scales, the first structures to go non-linear 
are superclusters and galaxies must then form by fragmentation; this is called a top down scenario, 
as opposed to a bottom up scenario in which structures form first on small scales and then cluster 
hierarchically into progressively larger entities. 

The formation of structure in HDM-like models was analyzed extensively by Zel’dovich and 
co-workers and can be understood analytically in terms of his beautiful approximation in which 
structures form by ‘pancaking’. While the gross features of large-scale structure predicted in the 
HDM model agree nicely with the impression of filamentary or sheet-like structure seen in galaxy 
surveys, it seems hard to reconcile the early appearance of quasars and radio galaxies and the 
absence of neutral gas at high redshift — the Gunn-Peterson test — with the fact that the large 
scale structure appears to be forming today. 

31.3.3 The Cold Dark Matter Model 

The cold dark matter model (CDM) also assumes that the bulk of the matter is non-baryonic, but 
that this material is not in thermal equilibrium at early times. One possible candidate for CDM is 
the axion (see Kolb and Turner) in which the density resides in coherent oscillations of a scalar field. 
For the present purposes all we need to know about the axion (or whatever) is that it behaves just 
like zero-pressure dust. 

If we track the evolution of say a galactic scale perturbation then at early times A » t, pres¬ 
sure gradients are negligible, and the perturbation grows in the usual energy/curvature conserving 
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k 3 P(k ) 



Figure 31.5: Power spectrum in the hot dark matter (HDM) model (schematic). The long-dashed 
line indicates the initial power spectrum. The vertical dashed line indicates the horizon size at the 
time the neutrinos become non-relativistic at z ~ ze q. In the HDM model the first structures to 
form are super-cluster scale, and smaller scale-structures must form by fragmentation. 


manner. The perturbation enters the horizon in the radiation dominated era; the baryon-photon 
plasma then oscillates acoustically and the growth of the sub-dominant CDM component stagnates. 
The fluctuations are not erased however, as they are in the HDM model, so at z eq the fluctuations 
can start to grow. On scales exceeding the horizon size at z eq there is uninterrupted growth. The 
end result is a suppression of small scale fluctuations by a factor ~ (A/Aneq ) -2 in amplitude. As 
we shall see, the preferred initial spectrum has S oc 1/A 2 initially, so on small scales the amplitude 
of the fluctuations becomes asymptotically constant (see figure 31.6). The details have been worked 
out numerically (Bond and Efstathiou, 1984; Vittorio and Silk, 1984) and it turns out that the 
progression from the large-scale to small scale asymptotes is very gradual, and consequently one has 
in effect a hierarchical or ‘bottom-up’ scenario. 

The CDM model has been explored in much greater detail than any of the other scenarios, and 
in many ways makes predictions which agree very well with observations; certainly the qualitative 
predictions fit very nicely with what is seen. In recent years, however, the simplest, and most 
attractive, version of the theory — in which the total density parameter is unity — has come under 
attack from observations of large-scale clustering; there is more power on large scales than the theory 
predicts. This will be discussed in greater depth below. 
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Figure 31.6: Power spectrum in the cold dark matter (CDM) model (schematic). The long-dashed 
line indicates the initial power spectrum. The vertical dashed line indicates the horizon size at the 
time the neutrinos become non-relativistic at 2 ~ zeq. 
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Chapter 32 

Origin of Cosmological Structure 


Having considered the evolution of density perturbations from some given initial state we now explore 
how the initial seeds for structure may have arisen. We first consider the ‘spontaneous’ generation 
of fluctuations from the effect of non-gravitational forces in the hot big bang model, and show that 
it is very difficult to generate large-scale structure in this way. We then consider the generation 
of density fluctuations from quantum fluctuations in the scalar field during inflation and finally we 
consider topological defects. 

Before embarking on these calculations, it is worth describing what seems to be required observa- 
tionally. In fact, long before inflation and when cosmological structure formation was still a relatively 
immature subject, Harrison and Zel’dovich pointed out that if the initial spectrum of fluctuations 
had a power-law spectrum P(k) oc k n extending over a very wide range of scales then it should have 
index n = 1. The argument is that for a power-law, the fluctuations in the potential, and therefore in 
the curvature, also have a power law spectrum. For most spectral indices, the curvature fluctuations 
will either diverge at small scales or at large scales. This would result in small black-holes if n is 
too large, or would lead to the universe being highly inhomogeneous on large scales if n is too small. 
The ‘happy medium’ (in which the curvature fluctuations diverge at both small and large scales, 
but only logarithmically fast) is that for which the root mean square density fluctuations scale as 
Sp/p oc 1/A 2 , so the potential fluctuations 5(f) ~ ( H\) 2 5p/p are independent of A. For a power-law 
power spectrum, the variance per octave of wave-number is (( Sp/p) 2 )k ~ k 3 P(k ) oc k 3+n oc A _ ^ 3+n b 
Thus, for n = 1, the potential fluctuations are scale invariant. This is known as the Harrison- 
Zel’dovich spectrum. While somewhat philosophically motivated, this kind of spectral index has 
much to commend it. Gott and Rees, had argued that the structure we see on scales of galaxies, 
clusters and super-clusters seemed to require an spectral index for the perturbations emerging after 
z eq of n ~ — 1. This is not the Harrison-Zel’dovich index, but allowing for the suppression of the 
growth of small scale perturbations during the physical processes described above during the era 
around z eq , these are consistent. The real clincher for the n = 1 spectrum came with the detection 
by COBE of roughly scale invariant ripples in the large-angle anisotropy of the CMB. Normalizing 
the spectrum to cluster or super-cluster scale structures, these fit very nicely to an extrapolation to 
larger scales using the Harrison-Zel’dovich spectrum. 


32.1 Spontaneous Generation of Fluctuations 

Consider an initially homogeneous universe and let the pressure become inhomogeneous (this might 
happen during a phase transition in the early universe, or at much later times when stars form and 
explode). This non-gravitational force will generate density perturbations. For a spontaneous cos¬ 
mological phase transition the pressure fluctuations should be uncorrelated on scales larger than the 
horizon scale at that time. Similarly, a natural model for the pressure perturbation from randomly 
exploding stars has a flat power spectrum on large scales. What is the amplitude of mass fluctuations 
on large scales generated by such a process? The answer is very little. Naively, one might imagine 
that there might be root-iV perturbations, with N the number of independent fluctuation regions, 
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giving Sp/p oc AI~ 1//2 . Alternatively, one might imagine there would be ‘surface fluctuations’ giving 
dp/p oc M~ 5 / 6 . Now it is true that if we measure the density within a sharp-edged top-hat sphere, 
then there will be fluctuations in the mass of this later order, but the fluctuations in growing modes 
will be much smaller than this; the amplitude of the growing mode is in fact Sp/p oc M ~ 7 / 6 . 

Let’s first obtain this result from a Newtonian analysis. What we shall do is compute the 
perturbation to the large-scale gravitational potential S(f> — since this is associated with the growing 
mode density perturbations — from which we can obtain Sp/p. Consider first a homogeneous 
expanding dust-filled cosmology containing an agent who can re-arrange the surrounding matter, 
but can only influence material at distances r < R (see figure |32.1 1. What is the perturbation to 
the Newtonian potential at large scales? The potential is 


5<j)(r) = -G J d 3 r' 


Sp{ r') 


(32.1) 


where the integrand vanishes for r' R. At large distances r R, we can expand the factor 
l/|r' — r| as 


. . . . -wo If 12r' • r — r' • r' x 1 ^ 2 

--- = (r • r (2r • r - r' • r')) 1/2 = - ( 1--- 

r' - r r \ 2 r 2 


1 


Making a Taylor expansion gives 

1 


r' -r 


r • r 


-3 


(r' • r) s 


(32.2) 


(32.3) 


Using this in (32.11 gives an expansion in powers of 1/r. The coefficient of the leading order term 
(for which S/> ~ 1/r) is f d 3 r' Sp(r'). This is the monopole moment of the mass distribution, but 
this vanishes by virtue of conservation of mass. The next term has S</> oc 1/r 2 , and has coefficient 
proportional to r ■ f d 3 r' Sp(v')v', which is the dipole moment. This vanishes by virtue of momentum 
conservation. The next term has Scj) oc 1/r 3 with coefficient proportional to the quadrupole moment. 
The does not, in general vanish; the agent, can, for example, rearrange the matter into a ‘dumb-bell’ 


shaped configuration without exchanging any mass or momentum with the exterior (see figure 32.11. 
If the mass contained within the perturbation region is AM, the large-scale gravitational potential 
is S(f>(r) ~ GAMR 2 /r 3 where R is the scale of the fluctuation region and this is smaller than the 
un-shielded monopole term GAAI/r by two powers of R/r. 

Now consider a multitude of such agents, with separation ~ R, each of whom re-arranges the 
surrounding matter in accordance with mass and momentum conservation, but otherwise in a random 
manner, such that different fluctuation regions are uncorrelated with each other (see figure [3T2| . The 
mean square large scale potential —- averaged over a region containing mass M, or size r ~ (M/p) 1 ^ 3 
— is then the sum of N ~ ( r/R ) 3 ~ M/AM quadrupole sources adding in quadrature, so the root 
mean square potential perturbation is 


Sct> M = m ) 2 }^ ~ a M-l/2. 


(32.4) 


The fluctuations in the potential are therefore a white-noise process. Now, for the growing mode, the 
potential and density fluctuations are related by 5(j> ~ (Hr/c) 2 Sp/p oc M 2 l 3 Sp/p, so the root mean 
squared growing mode density perturbations induced by this kind of small-scale ‘curdling’ has rms 
Sp/p oc M~ 7 / 6 . The mass distribution on large-scales at late times is much smoother than ‘root-iV’ 
mass fluctuations, and smoother even than the ‘surface fluctuations’. 

The argument given above is Newtonian and assumes conservation of mass. Do these conclu¬ 
sions still hold with fluctuations of the relativistic plasma? For example, consider a universe in 
which the process of baryogenesis is spatially inhomogeneous. If the photon-to-baryon ration - 
the specific entropy that is — is an incoherent random function of position, this will generate an 
initially isocurvature perturbation such that the number density of baryons is a white-noise process, 
but with the initial density fluctuation in the baryons being compensated by the radiation density. 
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Figure 32.1: Illustration of the type of perturbation that can be generated by a physical process 
that operates locally (within region delimited by the dashed circle). On the left is a monopole 
perturbation. This has a net excess mass and would generate a potential perturbation at large 
scales 6<j> oc 1/r. Such a perturbation is not allowed, since it requires importing mass from large 
distances; if one is constrained only to re-arrange the mass within the dashed circle, then for a 
symmetric mass configuration the net mass excess must vanish. In the center is shown a dipole 
perturbation with an over-dense region on the right and an under-dense region on the left. Such a 
perturbation would generate a large-scale gravitational potential 6<j> oc 1/r 2 . The net mass excess 
inside the dashed circle is now zero, but such perturbations are still now allowed as, in order to 
generate such a perturbation, one would need to impart a net momentum to the matter. On the 
right is a quadrupole perturbation. Such a perturbation can be generated by a local physical process 
while still conserving mass and momentum. A quadrupole source generates a large-scale potential 
perturbation Sip oc 1/r 3 ; this falls off much faster than for an ‘un-shielded’ monopole perturbation. 


Now as the universe expands the radiation will redshift away and will eventually become negligible. 
At late times then there will be fluctuations in the net proper mass contained within any comoving 
region with rms amplitude scaling inversely as the square root of the number of fluctuation regions, 
or Sp/p oc M -1 / 2 . Does this not conflict with the M - '/ 6 rule? Not necessarily, since we do not 
know what fraction of these perturbations is in the growing mode. To resolve this, recall the be¬ 
havior of spherical perturbations. To generate a decaying mode we delay the ‘bang-time’ keeping 
the energy constant, and the proper mass contained in the perturbation is fixed. In the growing 
mode we perturb the binding energy </>. The gravitational mass of the perturbation must equal the 
unperturbed mass, but as the binding energy is negative, we must actually have a slight enhance¬ 
ment of the net proper mass SM ~ —MS<p within the perturbation. Thus, the fluctuations in proper 
mass within comoving regions (which scale as Af -1 / 2 in this incoherent isocurvature model) measure 
Sep ~ (-ff 2 A 2 )<5 grow i ng and we recover the Sp/p oc Af~ 7 / 6 behavior for the growing modes. 


The large-scale growing perturbations produced by small-scale rearrangement of mass are there¬ 
fore very small and this effectively excludes the possibility that the large-scale structure results from 
curdling of the universe during a phase transition at early times because the horizon size is small 
then. It would also seem quite difficult to produce the largest scale structures seen from hydrody- 
namical effect of supernovae explosions (though the fact that a simple estimate of the net energy 
released based on the abundance of the results of nuclear burning in stars does not fall very far short 
of what is desired is tantalizing). In any such scenario, accounting for the large-angle fluctuations 
in the CMB is very difficult indeed, since the prediction is for temperature fluctuations falling off as 
ST/T ~5cp oc 6»- 3 / 2 . 
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Figure 32.2: Schematic illustration of the type of density inhomogeneity than can be produced by 
local re-arrangement of the matter. This is the type of perturbations generated by e.g. phase tran¬ 
sitions in the hot-big bang model, where the effects of pressure gradients are limited by the horizon 
size. Each fluctuation region generates a large-scale potential fluctuation Sp ~ GAMR 2 /r 3 , where 
AM and R are respectively the mass within and the size of a fluctuation region. The orientation, 
and in a realistic model also the amplitude, of the potential fluctuation is a random variable. There¬ 
fore the large-scale potential fluctuations are \fN times larger than the effect for a single region. 
With N oc M, the root mean squared potential fluctuation is Sp oc \fM/r i oc 1 /\[M. The po¬ 
tential generated by such a process is therefore a ‘white-noise’ process, with a flat power spectrum 
P${k) = constant (i.e spectral index n = 0). The mass fluctuations dp in the growing mode are re¬ 
lated to the potential fluctuations by X/ 2 8p = IirGSp, so Sp k ~ G~ 1 k 2 Sp j. and the power-spectrum 
of the mass-fluctuations is therefore P p (k) ~ (|<fpk| 2 ) oc k 4 . The spectral index is n = 4, and the 
root mean squared mass fluctuations at late times are Sp/p oc M -7 / 6 . 

32.2 Fluctuations from Inflation 

A more promising way to generate density fluctuations is from quantum fluctuations in the scalar 
field driving inflation. In chapter [29] explored a model of ‘chaotic inflation’ in which the inflaton 
field has potential V{(p) = \(p 4 /Tic. We found that provided cp \Jc 4 /G — or <p m v \ in natural 
units — the stress-energy tensor has P = —pc 2 , as required to drive inflation. While it was not 
stated there, we also assumed implicitly that the field was weakly self-interacting A < 1, and that 
the field value was p <C A _1 / 4 a/c 4 /G, so that the density p ~ \<p 4 /c 2 is much less than the Planck- 
density. With these assumptions, the important scales in the inflaton-radiation-matter dominated 
cosmology are as depicted in figure |29.1| It is a remarkable feature of the inflationary cosmology 
that, in addition to solving the flatness, horizon and possibly other problems, it naturally predicts 
that at late times, there will be density fluctuations re-entering the horizon with amplitude 



where these quantities are evaluated as the perturbations leave the horizon during the inflationary 
era. Since, H and cp are slowly varying during inflation, this naturally predicts seeds for structure 
formation close to the preferred Harrison-Zel’dovich form. 

As usual, since we are dealing with small amplitude fluctuations, the natural approach is to 
decompose the field into spatial Fourier modes, and compute the evolution of these separately. As 
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shown in figure |29.1| such a mode, being fixed in comoving wavelength, first appears, or rather 
becomes describable without a quantum theory of gravity, when the physical wavelength is on the 
order of the Planck length. As already discussed, for inflation to take place we require that the 
field fluctuation at that time be in the vacuum state to very high accuracy. This then sets the 
initial conditions; the initial occupation number for inflatons of this scale must vanish. A detailed 
calculation of the evolution is extremely technical, involving such tricky issues as the nature of 
the vacuum in curved space-time, as well as requiring a full general-relativistic treatment for the 
modes while they are outside the horizon. Here we shall only give a rather hand-waving sketch 
of the important processes and thereby physically justify the form of the key result (32.51. We 


will show that the requirement that the final density fluctuation amplitude agree with that required 
observationally puts very strong constraints on the strength of the interaction term (or mass term) in 
the inflaton potential. We will also discuss how inflation predicts, in addition to density fluctuations, 
fluctuations in all fields, and, in particular, predicts a stochastic background of gravity waves. This 
provides a potential test of the theory. 

First, we need to establish the nature of the fluctuations about the large-scale average inflaton 
field during inflation. We will denote the ‘background’ field by <j> o, and the fluctuations, which, as 
we shall see, are relatively small, by <f)\. The general equation of motion for the inflaton field is 


> + 3 H(j> + — V 2 </> + 4A -c/) 3 = 0, 


(32.6) 


where V denotes the derivative with respect to comoving coordinates. If we decompose the field 
as <f> = 4>o + </>i, where the ‘background’ field </>o is assumed to have V0o = 0, and make a Taylor 
expansion of the interaction term assuming that the fluctuations about the background are relatively 
small (i.e </>i -C <j> o) then the equation of motion for the perturbation, which will not in general have 
small spatial gradient, is 

„2 j,2„ 

(32.7) 


2 / 2 

f >i + 3H(j)i -\—— V 2 <(>i + 12A 9 <j )i = 0. 
or n 


Compare this with the equation of motion for a free massive scalar field 


cj) + 3Hcp H— <f> + 


2 4 

me 


= o. 


(32.8) 


Evidently, the fluctuations about the background field behave like a free field with mass 


m = y // 12A ft.c/g/c 3 . 

The Compton wavelength for the inflaton fluctuations is 

A c = h/mc 


(32.9) 


(32.10) 


which we can compare to the horizon scale c/H. With H 
ratio of these is 

A, [G& 


c/H 


VUp = V&7? ~ ^GA^/ftc 3 , the 


(32.11) 


Therefore, if the field is large enough to allow inflation (<fo ^c A /G) then A c c/H ; the Compton 
wavelength is much larger than the horizon. Thus, to a very good approximation, the classical 
equation governing fluctuations <pi is that of a free, massless field. This result is not specific to the 
V oc </> 4 form for the inflaton potential; the same is true for a V oc </> 2 theory or for other polynomial 
potentials. 

The initial conditions are that the occupation number for these fluctuations must vanish: rik = 
0. There are still zero-point fluctuations of the field with energy = huj^/2, but these cannot 
gravitate. We do not need to understand why this is so; we can simply take it as an empirical fact 
that these zero point fluctuations, which are present in all fields today, and which would predict 
an energy density on the order of the Planck density of p ~ 10 95 gm cm' 3 , are somehow not to 
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be included in the stress-energy tensor. In order to couple to real density fluctuations, these zero- 
point quantum fluctuations must somehow develop non-zero occupation numbers. This is easily 
seen to be inevitable. When the wavelength is much less than the horizon-scale, the effect of the 
universal expansion is negligible; the field fluctuations evolve just as they would in flat space with 
adiabatically conserved occupation number n k = 0. However, as the wavelength approaches the 
horizon scale, the frequency of oscillations u ~ c/A approaches the Hubble expansion rate and 
adiabaticity breaks down. Hopefully it is not too much of a stretch to accept that the result must 
be occupation numbers of order unity as modes leave the horizon. We can then estimate the 
amplitude of these now real inflaton fluctuations. The energy per mode is E k ~ hu>, so the energy 
density is e ~ L~ 3 )T) huj^ ~ f d 3 k hu>k ~ 7ifc 3 u;* ~ hc~ 3 u>^ where w* ~ H. The stress-energy 

k 

tensor for massless free fields is e = (cp 2 )/2c 2 + ((Vc£ i) 2 )/2, but these terms are equal, so we have 
e ~ {$) /c 2 ~ u 2 {(j)\)/c 2 . Equating these two expressions for the energy density, and using u> ~ H 
for these trans-horizon scale modes, we find that the field variance is 


< 0 ?> 


TiH 2 


(32.12) 


Before proceeding, it is interesting to note that the prediction for the field fluctuations derived 
here are the same as for a field in thermal equilibrium at a temperature kT ~ hH. For such a 
field, the occupation number is exponentially small for wavelengths k = Wk/c kT/he, and most 
of the field variance comes from modes with k ~ kT/c. An alternative route to (32.121 is to show 


that the existence of the event horizon during inflation results in Hawking radiation of temperature 
kT ~ H/Tikc. 

The clear prediction — and this is made more quantitative in the full treatment — is for inflaton 
field fluctuations at horizon exit of amplitude <5</> ~ yjUH 2 /c. To understand how these couple to 
density and curvature fluctuations at the end of inflation, and subsequently to density fluctuations 
at horizon re-entry, consider first a region where 8<p happens to be small. This region will inflate by 
a certain number of e-foldings, and will then re-heat to a density determined solely by the nature 
of the inflaton potential and its couplings to other fields. Now consider a region of the same initial 
size, but where the field fluctuation happens to be positive. The field in this region starts up ‘higher 
up the hill’, so this region inflates for slightly longer, and ends up occupying a slightly larger volume 
when it re-heats (to the same density as the unperturbed region). The extra expansion factor is 
exp (HSt), where St is the time taken for the field to roll from 4> = c/q + 8(f) to (f> = 4>q. This is just 
St = 8 </>/</>. For small 8(j> we can expand the exponential as exp (HSt) ~ 1 + HSt ~ 1 + H5<f>/<j>. This 
is the excess of volume occupied by the perturbation region as compared to what it would have been 
had 5(f> been zero; clearly to replace a given volume in the background model by a slightly larger 
volume requires that the perturbed region have slightly positive curvature, which, as we have seen, 
can be related to the Newtonian potential fluctuations. Alternatively, for a large-scale perturbation 
which enters the horizon after matter domination, there will be an excess of proper mass within the 
perturbed region 8M/M ~ HSt, which is just equal to the Newtonian potential fluctuation, which 
again is just equal to the density fluctuation at horizon re-entry. Either route yields a prediction for 


the late-time horizon scale density fluctuation given by (32.51. 


The late-time density fluctuation amplitude is therefore set by the values of H and <j> at horizon 
exit. Since the field will be rolling slowly at the terminal velocity cf> = AXccj/^/hH , these will vary 
slowly with wavelength, so the prediction is for a spectrum with index n close to unity. As further 
consequence is that since the initial ‘zero-point’ fluctuations are statistically independent, so also 
will be the complex amplitudes for the density fluctuations; i.e. the prediction is that the density 
perturbations will take the form of a Gaussian random field. 

Observations of the microwave background tell us that the density fluctuation at horizon re-entry 
is around Sh ~ lO -5 . Matching this requirement places a constrain on the interaction strength 
parameter A (or its equivalent for other choices of the inflaton potential form). Using H 2 ~ Ge/c 2 ~ 
GX</)q/?ic 3 and <j> ~ A aj^/hH we have 


Sh 




(32.13) 
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Now the first factor must be greater than unity for inflation to take place. In fact, we found that we 
needed <po 1> \/c 4 /eG where e is the inverse of the number of e-foldings required to solve the horizon 
problem. This means that the pre-factor on the right hand side of (32.131 is around 100 (though 
the precise value is dependent on the energy scale of re-heating), and therefore a viable model must 
have a dimensionless interaction strength 


A ~ 10 <5 


-4x2 
H 


10 


-12 


(32.14) 


Finally, while we have focused on the fluctuations in the inflaton field, since it is these which give 
rise to density fluctuations, the first part of the argument here can be used to predict the horizon-exit 
value of the amplitude of any fields which are effectively massless during inflation. In particular, the 
theory predicts that there should be fluctuations in the graviton field - gravitational waves that is 
— with amplitude on the order of the expansion rate in units of the Planck frequency. These waves 
are ‘frozen-in’ which the perturbation is outside the horizon and then start to oscillate on horizon 
re-entry. A high priority for future measurements of the microwave background anisotropies is to 
measure the strength of these waves. 


32.3 Self-Ordering Fields 

An alternative, and also highly attractive, possibility is that the seeds of structure may be due 
to self-ordering fields. The idea here is to have some scalar field or such-like which is initially in a 
highly disordered thermal state, but which has potential function of the kind invoked in spontaneous 
symmetry breaking. As the universe expands, the field temperature decreases and eventually it 
becomes energetically favorable for the field to fall into the minimum of the potential function. Such 
fields try to ‘comb themselves smooth’, but are frustrated in this due to the formation of topological 
defects. The most common example of this phenomenon at low energies is a ferro-magnetic material 
which, if cooled from high temperature, will undergo a phase transition and will develop domains 
which are bounded by walls. In cosmology, as at low energies, the character and evolution of such 
systems depends critically on the dimensionality of the field involved. Here we shall consider first 
1-dimensional fields, which give rise to domain walls , but, unfortunately do not seem to be consistent 
the observed state of the universe. We then consider 2-dimensional fields, which, as we shall see, 
give rise to cosmic strings , and which are a more promising mechanism for generating cosmological 
structure. 


32.3.1 Domain Walls 

Consider a real scalar field with Lagrangian density 

</>, V0) = ^4> 2 - ^(V</)) 2 - V{<f>) (32.15) 

with potential function 

V{4>) = Vo - rn ^r<t> 2 + (32.16) 

h Tic 

as sketched in figure |32.3| This field has a negative mass parameter, and a self-interaction pa¬ 
rameterized by the dimensionless constant A. The potential has asymmetric minima at field values 
<f) = ±0o, the solutions of dV/d<j> = 0, where 


</> o = 


4A h 


(32.17) 


and has an unstable maximum at <f> = 0. If we require that P(±^> 0 ) = 0, the constant Vo is related 
to the parameters m, A by 


H(0) = Po = 


m 4 c 5 


16A^ 2 ' 


(32.18) 
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Figure 32.3: The potential function for a real scalar field involved in the generation of domain walls. 


Now a field in thermal equilibrium at temperature T has energy per mode E k = (n k + I/2)hu> k 
with n k = (e huJk / kT — l) -1 . If the field is effectively free and massless, the dispersion relation is 
simply cck = ck. Ignoring the zero point energy, and taking the universe to be a periodic box of size 
L , the energy density is 

e(T) = L~ 3 Y, = J n k hu k ~ ~ (32.19) 

where lot = kT/h. This is just the Stefan-Boltzmann law. The energy density is also related to the 
mean square field fluctuations by the stress-energy tensor: e ~ 0 2 /c 2 ~ w 2 0 2 /c 2 , so equating these 
two expressions for e gives the root mean square field fluctuation at temperature T: 


7 


2\l/2 



(32.20) 


This says that the typical field value for a field in thermal equilibrium is proportional to the temper¬ 
ature (in natural units the root mean squared value of the field - which has dimensions of energy — 
is just equal to kT). If we use this to compute the potential energy density ei„t due to the interaction 
term we find 


Cint 


4 

he 


( 0 2 ) 2 


A(fcT) 4 

(he) 3 


(32.21) 


Thus, provided the dimensionless interaction strength A is much less than unity, as we shall assume, 
for a thermal state the interaction energy is a small perturbation to the total energy (32.191. 

At high temperatures such that kT y/haj>o the typical field values are much greater than 
(f >o and the hill at the center of the potential is then relatively unimportant for the motion of the 
field. However, as the universe expands the temperature and the field amplitude decrease until the 
thermal field fluctuations become of order (f> 0 and below this temperature the field will be trapped 
in one or other of the local minima. This phase transition occurs at a critical temperature 


kT c ~ Vhcij )o 


me 2 

77 


(32.22) 


Now since the field configuration is initially highly spatially incoherent, different regions of space will 
want to settle into different minima. What happens (see simulation??) is that the field will become 
locally smooth within domains with value <f> = ±0o, since this minimizes the (V0) 2 contribution 
to the energy density, with domains separated by domain walls where there is a strong localized 
gradient of the field. One can estimate the thickness of these walls on energetic grounds to be 


00 




Ax 


(32.23) 
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Figure 32.4: The upper panel shows the variation of the field passing through a domain wall of 
thickness Ax. We can estimate the width, and surface mass density of a domain wall, as follows. 
If the wall has width Ax then the typical field gradient within the wall is V(f> ~ </>o/Ax. The 
energy density is then e ~ (<f> q/A x) 2 + Vo, so the density per unit area E is given by c 2 E ~ eAx ~ 
<f> q/Ax + VqAx. If the wall is too thin, the gradient energy term becomes large while if the wall is 
too thick the potential term is increased. The total energy is minimized for Ax ~ <j>o/\/Vo- This is 
the width of a stable domain wall, for which the mass surface density is E ~ ifioy/Vo/c 2 . 


(see figure 32.4 and its accompanying caption), and the mass-energy surface density in a stable wall 
is 

S ^ ^oVVo ' (32.24) 


The single static planar wall is highly idealized. The initial walls configuration will be highly 
disordered. Again, energetic considerations tell us that the system will evolve to minimize the total 
energy in the walls. A simply connected region bounded by a wall will tend to shrink. In doing so, 
it will convert the potential energy into kinetic energy, so we expect walls to be moving at speeds on 
the order of the speed of light. Such a region will shrink to zero size on a time scale of order its size 
divided by c. The energy released will propagate away as waves, but these will damp adiabatically, 
so between the walls the field will remain relatively smooth. The expectation then is that any 
regions smaller than the horizon scale ~ ct will disappear, but the field at separations bigger than 
the horizon scale will remain uncorrelated; the field dynamics will result in domains, at any time, 
on the order of the horizon size. In fact we expect a scaling solution where the field looks the same 
at any time save for scaling of the mean wall separation with the horizon scale. 

This behavior is illustrated, for a field in 2-dimensions in figure [32~5] The equations for a scalar 
field in 2-dimensions, with a W-shaped potential and with a weak damping term were evolved 
numerically using a simple centered algorithm. The intial field was a Gaussian random field with a 
flat spectrum, aside from a smoothing with a small kernel to make the field smooth at the spatial 
sampling scale. The initial field amplitude was somewhat higher than <j> 0 , but the damping term 
cools the field, which starts to separate into domains where <fi — o- As the system evolves, enclosed 
regions can shrink to zero size and then disappear with a release of energy in a circular out-going 
wave. The scale of the walls gradually increases with time. 

If we say there is on the order of one wall, of area ~ (ct) 2 per horizon volume (ct) 3 , the mean 
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Figure 32.5: A set of frames from a computation of spontaneous symmetry breaking. The full 
animation can be viewed at http://www.ifa.hawaii.edu/~kaiser/wavemovies 
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Figure 32.6: The potential function for a 2 component scalar field involved in the generation cosmic 
strings. 


mass-energy density in walls is 


Pwalls 


ct 


(32.25) 


This is a serious problem, since the density of the matter, or radiation, in the universe is p = 
3H 2 /8nG, which scales as 1/t 2 . Thus the walls will rapidly come to dominate the universe; and one 
would have very large density inhomogeneity on the horizon scale. This is not what is observed. 


32.3.2 Cosmic Strings 

Now consider a two-component scalar field <p for which the potential V (</>) is the two dimensional 
analog of (32.16) as illustrated in figure 32.6 This is often called a ‘Mexican-hat’ or ‘sombrero’ 


potential. The minimum energy is on the circle \<p\ = (f>o and the field will try to relax towards this. 
There will be oscillations about the minimum, but the amplitude of these decreases adiabatically and 
the field will develop regions where the field lies in the minimum and varies slowly with position. 
While a completely uniform field is energetically favored, just as for domain walls, the assumed 
initial incoherence of the field limits the scale of coherence; the formation of a single infinitely large 
domain being frustrated by the formation of a network of cosmic strings — localized regions of 
energy density where the field sits at <f> = 0. 

To get an idea of the topology of the initial string network, picture the initial field as filtered 
white noise with some coherence length set by the filter, and model the initial field evolution as 
simply rolling ‘downhill’ to the nearest minimum. In most places the field will vary quite smoothly 
with position, with ~ (/> 0 /A, where A is the coherence length for the initial field. However, at 
positions where both components of the field (f>\ and <f>2 were initially very small there will be very 
large gradients — and therefore very high energy density — localized near the regions where cf> = 0 
initially. Now in 3-dimensional space the field 4>\ will generally vanish on a surface, and similarly for 
<f> 2 , so the regions where both components vanish are the intersection of these surfaces; i.e. on lines, 
or ‘strings’. There is another way of looking at this; if we traverse an arbitrary loop in the real three 
dimensional space, the field moves along a closed trajectory in 2-dimensional field space. If the field 
is trapped in the circular trough then it is possible that the field trajectory will pass once around 
the brim of the sombrero; we would say that this loop has a winding number of one (or minus one, 
depending on the sense of rotation of the field). Now this winding number is a topological invariant ; 
we can make a continuous deformation of the loop and the winding number cannot change, provided 
the field is everywhere confined to the minimum energy circle. 

Now the energy for this toy mode is in fact divergent. What is energetically more favorable is 
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for the field to sit at cf> = 0 along the string axis, with the field falling to the potential minimum 
within some distance — the string thickness Ax. We can estimate what this is as follows. Consider 
a perfectly axi-symmetric field with unit winding number, and let’s assume to start with that the 
field everywhere (except perhaps exactly on the axis) lies in the minimum. We can choose the spatial 
coordinate axes such that the field is 


(t>l 

_ 4>0 

X 

4>2 

r 

. y. 


The field gradient term in the energy density is 

2 s ajnsj. = 

orj orj r z 


(32.26) 


(32.27) 


If we integrate this from r mm to ?’ max we find a contribution to the line density 
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(32.28) 


so the line density diverges logarithmically if we let r m ; n —» 0. Now consider a crude model in 
which the field lies in the zero potential for r Aa’ but has <^ ~ 0 for r Aa; with some smooth 
transition between these. The gradient contribution to the line density will then contain a component 
a ~ (/>q log(r max /Aa;)/c 2 and there will be a contribution from the potential a ~ VqAx 2 /c 2 , so the 
total line density will be 

a ~ ^( acr o lo g( r m a x/Aa;) + f3V 0 Ax 2 ) (32.29) 

where a, (3 are dimensionless coefficients of order unity. Setting the derivative of this with respect 
to Aa: to zero gives the string width for minimum line density (i..e. energy) 



just as above for domain walls. The linear mass-density is 

VqAx 2 <j>l 
a ~~ 


(32.30) 


(32.31) 


The generation of this network of strings during a phase-transition involving a two-component field 
is known as the Kibble mechanism. 

Just as for walls, the initial string network will be quite contorted. Calculating the stress-energy 
tensor (or more simply applying energetics arguments) again tells us that the string network will 
not be static but will develop transverse velocities ~ c. The character of the evolution of the string 
network is qualitatively different, however. Strings can reconnect when they intersect and so loops 
can be chopped off the network. Such a loop may further intersect itself, but there are stable loop 
configurations which sit there and oscillate. Such loops have large quadrupole moments and are 
moving relativistically, so they are quite efficient at radiating gravitational radiation. One can show 
that such loops will decay after ~ c 2 /Go oscillations. 

It is reasonable to expect that such a network will evolve towards a scaling solution with roughly 
one long string per horizon volume (that being the distance a string section will typically move). If 
we estimate the mean density in such strings we find 


er 

Pstrins ~(df- 


(32.32) 


This is quite different from the case of walls where the density falls as 1/f; here the string density 
evolves in the same manner as the mean density of matter or radiation, whichever happens to 
dominate. Thus we expect to have a constant fraction of the total energy density in string at 
any time. That the system should tend towards the scaling solution seems very reasonable — if 
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there were too much string in some region then the interconnection would be more vigorous than 
on average and vice-versa — and early simulations of the evolution of the string network were 
performed and seemed to confirm this. This led to a simple picture of a continuously evolving 
network of long strings with a debris of oscillating loops lying around (whose mass spectrum could 
be crudely estimated from the dynamics of loop production) and it was supposed that the loops 
would act as point-like ‘seeds’ for structure formation. In this picture the density fluctuations would 
be highly non-Gaussian, in contrast to the fluctuations arising from inflation for instance. However, 
subsequent higher resolution simulations showed that this picture was somewhat flawed. The simple 
intuitive expectation (and low resolution simulations) did not incorporate an important feature; 
each time strings chop, discontinuities form and propagate along the string as traveling waves. As 
time proceeds the network develops more and more fine scale structure. It is still suspected that a 
scaling solution will result, but performing the needed simulations is quite a challenge. Analysis of 
the higher resolution simulations suggest that the simple one loop-one object picture for structure 
formation was overly simplified and that the myriad of rapidly moving loops produces something 
more akin to Gaussian fluctuations. 

Perhaps the nicest feature of the string model is that the model has only one free parameter — 
the line-density of the strings a. This sets the amplitude of density fluctuations at horizon crossing. 
We can estimate this as follows: The total density is p to t ~ H 2 /G, so the ratio of string to total 
density is 


Pstring _ c tG ^Ga^Gl § ^ G ( kT c ) 2 _ /kTA 

Ptot c 2 H 2 t 2 c 2 c 4 c 4 he ~ \E p i) 


(32.33) 


where we have used a ~ <j%/c 2 , E p \ ~ \ffuFJ~G and kT c ~ y/hc<j> q. Now the energy density 
fluctuations in the strings are of order unity at the horizon scale — there being on the order of one 
string per horizon — and therefore the total density perturbation at horizon crossing is 


ftp Pstring / kT c \ 

P Ptot \ A pi J 


(32.34) 


The gravity associated with the string network drives motions of the rest of the matter and thus 
excites growing density perturbations which could plausibly account for the structure we see. This is 
very encouraging. First, the theory naturally generates perturbations with scale invariant amplitude 
at horizon crossing; the Harrison-Zel’dovich spectrum. Second, for strings formed at around the GUT 
scale of kT c ~ 10 16 GeU ~ 10 _3 U p i, this predicts 6 ~ 10 -6 , which is not far from that observed. 
Unfortunately, while the formation of strings at the GUT time is not mandatory, the formation of 
monopoles is, and these monopoles are a disaster. They can be gotten rid of by inflation — and one 
major motivation for inflation was the monopole problem - but then one would inflate away the 
strings as well. 

An interesting feature of the negative tension is that the stress-energy tensor for an infinite static 
string is trace-free and consequently the string produces no tidal field. Outside of such a string 
spacetime is flat, but it is topologically different from ordinary Minkowski space in that there is a 
small deficit in azimuthal angle Sip = InGa/c 2 . A particularly distinctive features of the cosmic 
string model arises via gravitational lensing; lensing by long strings can produce a unique signature 
both in images of distant galaxies and in the microwave background. 

There are other defects which can form. We have already discussed formation of walls from 
1-dimensional fields, and monopoles from three dimensional fields, both of which are dangerous. A 
four-component field is more benign and gives rise to texture. A texture is not a topologically stable 
defect. Textures are most easily pictured in 1-D — where they result from having a 2-component 
field — and such a texture can shrink until (V</>) 2 ~ U(0) at which point it will unwind. 
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Chapter 33 


Probes of Large-Scale Structure 


33.1 Introduction 

The inflationary/CDM model makes quite definite predictions for the power spectrum of the density 
perturbations emerging from the early universe, and also that the fluctuations should take the form 
of a Gaussian random field. On small-scales, structures have gone non-linear, resulting in galaxies 
and groups and clusters of galaxies. In many ways the cleanest tests of cosmological structure 
formation theory comes from measurements of large-scale structure, where the fluctuations are still 
in the linear regime (5p/p <C 1), and thus directly reveal the initial conditions. There are four classic 
probes of large scale structure: galaxy clustering §33.2| deviations from pure Hubble expansion or 
bulk-flows §33. 3| anisotropy of the microwave background §33. 4| and the distortion of the shapes 
of faint galaxies from weak lensing §33. 5| Rather than try to present a snap-shot of the current 
results in each of these fields — which would rapidly become out-dated - we will focus on the basic 
physical principles and set out what are the key strengths and weaknesses of the different probes. 


33.2 Galaxy Clustering 

In some ways, galaxy clustering is the most straightforward probe of large-scale structure. The 
galaxies are like a dust of test-particles which flow with the matter in general and their spatial 
distribution thereby traces the underlying total density. By measuring the statistical properties 
of the galaxy distribution — auto-correlation function, power spectrum and higher order statistics 
— we can test the predictions of early universe theory. There are two basic techniques that are 
used here. One is to use angular surveys, where one measures the angular positions of galaxies and 
extracts for example u>(0), the two-point angular galaxy correlation function. The other is to use 
redshift surveys , where one has the three-dimensional distribution of galaxies, with distance being 
inferred from the redshift. The advantages of angular surveys is that they are cheap and deep, 
particularly with the advent of large-format CCD detectors. The disadvantage is that the structure 
tends to get washed out in projection. Also, because we see a large number of structures along 
any line of sight, the projected fluctuations become Gaussian by virtue of the central limit theorem, 
making it hard to test for primordial non-Gaussianity. Redshift surveys avoid the latter problems, 
but are much more expensive to obtain; one can determine the positions of on the order of 10 5 
galaxies in a 30 minute observation with a large telescope, reaching magnitude mn ~ 26 — 27. The 
deepest redshift sourveys currently feasible are restricted to mu ^ 24, and take many hours per 
field, with much smaller field of view. A complication in analysing such surveys is that the redshift 
does not strictly measure the distance; the peculiar velocities associated with the growing structure 
result in redshift space distortion, though, as we shall see, we can turn this to our advantage and 
use it as a way to determine the density parameter. 
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33.2.1 Redshift Surveys 

The technique here is to take images of a field to determine the locations of the galaxies in some 
chosen range of magnitudes, and then to obtain optical spectra, this typically being done using 
either a multi-slit spectrograph or using optical fibres to conduct the light from the focal plane to 
a single spectrograph. State of the art systems are capable of collecting several hundred spectra 
simultaneously. The result of this exercise is a catalog containing angular positions and redshifts for 
all of the galaxies in the chosen region of the sky which meet the selection criterion. 

33.2.2 Poisson Sample Model 

In order to analyze this kind of catalog and relate the results to theoretical predictions we need 
to have some kind of model for how galaxies are related to the underlying density field. Now one 
possibility is a ‘what-you-see-is-what-you-get’ model, where the luminous and dark matter densities 
are precisely the same. However, this is not really viable; first of all, we know empirically from 
rotation curve studies and from gravitational lensing that galaxies have extended haloes. Evidently 
the luminous matter is much clumpier than the dark matter on small scales. This property is also 
readily understood physically; when galaxy halo sized objects form, the baryons can dissipate their 
binding energy quite efficiently via collisional effects (see later), and so can sink to the bottom of the 
potential well while the dark matter, being like a collisionless fluid cannot contract because of phase- 
space density conservation. Second, there is good reason to believe that the luminosity of a galaxy is 
not simply proportional to the amount of baryons in the region from which it forms. There is almost 
certainly a strong stochastic element to the luminosity to baryon mass density. Young galaxies are 
very bright, and then fade, and galaxies may undergo subsequent bursts of star formation. These 
considerations motivate the model for the relation between galaxies and the underlying, or total 
mass, density field, know as the Poisson sampling model. One way to visualize this model is to 
consider a sea of particles so numerous that they have an effectively continuous number density p(r). 
Now assign to each particle a uniformly distributed independent random number p in the range 
0 < p < 1 and paint those particles with p < e red, where e is some small constant. The red particles 
define a Poisson sample of the continuous field p(r). 

Mathematically one can formalize this by imagining space to be divided into tiny cubical cells 
with labels i having volume 6Vi and and occupation number n,. The mean occupation number is 

= ep(r. t )SV. (33.1) 

The probability that a cell is occupied is 

P(n t = 1) = rii (33.2) 

and, in the limit that the cells become infinitesimal, the probability that rii > 1 becomes negligible, 
so the probability that the cell is empty is 

P(ni = 0) = 1 -rii. (33.3) 

So far galaxies have been modelled as identical, featureless points. The Poisson sample model 
can be extended to incorporate a distribution of intrinsic luminosities or other characteristics. If the 
luminosity function — the distribution function for luminosities — is <j)(L) then the extension of the 
Poisson sample model is that the probablility that a cell in the 4-dimensional position-luminosity 
space is occupied is 

P(m = 1) = ep(r)<j)(L)d 3 VdL. (33.4) 

This model says that one draws particles at random from the quasi-continuous sea, and then assigns 
each of them a luminosity drawn at random from the luminosity function. 

The Poisson sample model underlies most observational work in galaxy clustering — so much 
so that the adoption of this model is rarely explicitly stated in the literature. However, we should 
remember that it is only a model, and one that cannot be strictly correct. It is the opposite 
extreme from the ‘light traces mass exactly’ WYSIWIG model described above. Reality probably 
lies somewhere in between. 
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33.2.3 Correlation Functions 

In this model, the probability that two particular cells (labelled 1 and 2), separated by r 12 are 
both occupied is just the product of the probabilities that each cell is occuped P(n\ = l,n 2 = 
1) = e 2 8V 2 p{vi)p(v 2 ). We are ignoring here, for simplicity, the distribution of luminosities and are 
treating galaxies as identical featureless points. The mean value of the product of cell occupation 
numbers taken over all pairs of cells with separation ri 2 is then 

npriA — e 2 p(r 1 )p(r 2 )5ViSV2. (33.5) 

This is, if you like, the average taken over an ensemble of realizations of points generated by sampling 
a single given density field p(r). Now the density field p{ r) is itself a random process and Nature, by 
creating us at a particular location in space, has provided us with a specific realization of this field 
in the region around us that we can observe. Learning about the specific density field configuration 
around us is interesting, of course, but for testing theories of structure formation, we are more 
interested in learning about the statistical properties of the ensemble of all density fields. Under the 
Copernican hypothesis — that we are placed at a random location in a statistically homogeneous 
random universe — these properties can be encoded in the hierarchy of correlation functions. A 
nice feature of the Poisson sample model is that it makes it relatively easy to relate the correlation 
function of the underlying density field p(r) to counts of galaxies. 

Consider the ensemble average of the product of the occupation numbers for a pair of cells 

{nm 2 ) = e 2 (p(ri)p(r 2 ))5Vi5V 2 (33.6) 

and defining the fractional density perturbation (5(r) such that p(r) = p(l + S( r)) this becomes 

(mn 2 ) = e 2 p 2 SV 2 ( 1 + £(ri 2 )) (33.7) 

where £(ri 2 ) = (<5(r)<5(r + r^)) is the auto-correlation function of the density fluctuation field. If 
there were no underlying structure (£(r) = 0) the expectation of the product of cell numbers would 
therefore be {n\n 2 ) = e 2 p 2 8V 2 . From a purely observational standpoint, the two point correlation 
function £(r) measures the fractional excess probability that a pair of cells are both occupied, given 
that they have a separation ri 2 . It also measures the fractional excess probability that an observer 
living on a galaxy has a neighbour in a cell at that relative location as compared to an observer at 
a randomly chosen location (problem??). 

Now we can estimate £(r) by counting pairs with an appropriate separation. Obviously we must 
average over some range of separation in order to get a sensible number of pairs. Given some bounded 
region with cells labelled i we can write the number of pairs of galaxies with separation |r,j | in the 
range r — Ar/2 < |r, 7 j < r + Ar/2 as a double sum over cells: 

IVp(r, Ar) = jS(\rij\;r,Ar) (33.8) 

* 3 

where S'(|r.y|; r, Ar) is unity (zero) if the the separation lies in (outside) the allowed range. Taking 
the geometry of the observed region to be given, the expectation value of N p is 

(N p (r, Ar)) = ^ ^(n i n j )S'(|ry|; r, Ar) (33.9) 

i 3 


and using (33.7) this becomes 


(Np(r,Ar)}=e 2 p 2 Y^fiV i Y^fiV j ( 1 + t(r ij )) = e 2 p 2 Jd 3r i J (Prj (1 + £(r, - r,)). (33.10) 


Let’s assume that the range of separations is quite small A?’ -C r, and that £(r) depends only 
on r\j = r i: j and is such that, at separation r the scale for changes in £(r) is on the order of r; 
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i.e. d^/dr ~ £/r. Under these assumptions we can take £(r^) to be the constant value for the center 
of the averaging bin £(r), so 


(N p (r,Ar)) =n 2 (l + £(r)) J d 3 n j d 3 rj (33.11) 

where n = e(p) is the mean density of galaxies. 

To convert this to a practical estimator for £(r) we just need some way to estimate the double 
spatial integral here. Since the survey volume is usually specified by giving a range of anglular 
coordinates on the sky, perhaps subject to a set of ‘bad regions’ (bright stars, data drop-outs etc), 
this is most easily computed in a Monte-Carlo fashion by generating a very large set of points with 
random independent positions and then applying a filter to restrict them to the survey region. Let 
the mean number density of random objects be n ranc j and that of real objects be n rea i = n ran d/o. We 
denote the number of real pairs by DD = 7V Pjl - ea i (for ‘data-data’ pairs) and define RR = N Piian< x/a 2 
(for ‘random-random’ pairs, but applying a dilution factor 1 /a 2 to allow for their larger mean number 
density). The expectation value of DD is then 

(DD) = (1 + £(r))RR. (33.12) 

A fair correlation function estimator £(r) — i.e. one for which (£(r)) = £(r) — is 

Hr) = - 1. (33.13) 

This is a simple, and often used estimator, but it has a number of drawbacks: 

• The mean density of galaxies must usually be determined from the from the actual density of 
galaxies in the survey. This introduces a so-called integral constraint that the volume integral 
of the two-point function estimator is forced to vanish. Since £(r) tends to be positive at small 
separations, this introduces a negative bias in £(r) at separations approching the size of the 
survey. This is largely unavoidable. One can try to model the effect, but this requires assuming 
some form for £(r) which doesn’t seem fair. The best solution is to obtain more data! 

• The density fluctuations couple with the sharp boundary of the survey to produce spurious 
results. Say the density field is rather smooth, with fluctuations on some ‘coherence-scale’ r c . 
If we slice through a structure the resulting density field has the appearance of sharp, or high 
spatial frequency, structure and this will cause an error in measurements of £(r) at small scales. 
This is not a bias, since the error is as likely to be positive or negative, but is an unwanted 
kind of ‘cross-talk’ between large-scale fluctuations and smaller scales which complicates the 
error analysis. This problem can be ameliorated somewhat with an alternative estimator (see 
below), or by ‘apodizing’ the survey; i.e. applying weights which taper off towards the edges. 

• A proper error analysis is very difficult, since, as we shall see, the estimates of £(r) for different 
bins are not independent. 

The results of redshift surveys indicate that the galaxy-galaxy correlation function has a power- 
law form £(r) ~ (r/?’o ) -7 for r < lOMpc and with 7 — 1.8. On larger scales the correlation function 
departs from a power law. 

Regarding the error analysis, it has often been suggested that one estimate the uncertainty in £(r) 
by some form of ‘bootstrap’ approach. An example is to take the actual data, but assign random 
weights to the galaxies (e.g. one might discard half of the galaxies at random). The differences 
between such an ensemble of such estimates provides some idea of the fluctuations £(r) about the 
true value, but is not usually what one wants. There are two elements of randomness in the generation 
of a sample of galaxies. The first is the realization of a particular density field p( r) from the ensemble 
of all statistically equivalent initial conditions. Equivalently, by ergodicity, this is the randomness 
arising from our particular randomly chosen location. The second is the randomness inherent in the 
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assumed Poisson sampling of the density field. Now on large scales the former tends to dominate, 
while bootstrap methods are only sensitive to the latter, sub-dominant, type of uncertainty. The 
latter type of error can best be thought of as a ‘measurement error’; just as in photometry, where we 
have 1/y/N fluctuations in the flux from the number of photons, here we have 1/y/N fluctuations in 
any estimate of <5(r) arising from the finite number of galaxies. On large-scales, however, the number 
of galaxies per ‘structure’ is very large. Superclusters — objects living right at the boundary between 
the linear and non-linear regimes — contain hundreds or even thousands of galaxies. The 1/y/N 
fluctuations — often called the sampling variance or cosmic variance arising from the finite number 
of structures is much larger than 1/yj iVg a i s . Since, in a magnitude limited survey, the mean number 
density of galaxies decreases with distance, this suggests that one should give more weight to more 
distant galaxies. To understand the estimation of uncertainties in £(r) and make these ideas more 
precise we now turn to power-spectrum estimation. 


33.2.4 The Power Spectrum 

The two-point correlation function £(r) is the Fourier transform of the power spectrum P(k) and 
vice versa , so the two statistics are mathematically equivalent. One can estimate P(k) as follows: 

First take the Fourier transform of the galaxies, considering them to be the function consisting 
of a set of ^-functions: 

/(k)= Y e ik ' r ® = Y ^e ikr \ (33.14) 

galaxies, g cells, i 

Multiplying this by its complex conjugate and taking the expectation value gives 


(\mf) = YT,( n * n jy k ' iTi ~ ri) - 


(33.15) 


* 3 


Using (33.71, and assuming, for simplicity that the mean number density of galaxies is spatially 
constant, we have 


(|/(k)| 2 > = ]T(n 2 > + Ysvi E sv . H 1 + to - u-)K k ' (r ‘- rj) 

i i j^i 


(33.16) 


where we have realized that (33.7 ) applies only for i ^ j. Now since rii = 0 or 1, n'f = rii and hence 
(n 2 ) = nSV , and, converting the sums to integrals, we have 


(|/(k)| 2 )=?r / d 3 rW(r)+n 2 / d 3 rW( r) / d 3 r' W(r')(l + £(r — r'))e 


— r bW k ( r - r ') 


(33.17) 


Where W(r) is the ‘window function’ describing the shape of the survey volume. Next, using 
W(r) = (2n)~ 3 f d 3 k lU(k)e _ ' k r , and W( k) = J d 3 r W( r)e ik r , we have 


(|/(k)| 2 ) =nW(0)+n 2 W 2 (k) + j 

r d 3 k' 
(2tt)3 J 

f ^ 3/c " W( k, )W*(k") J d 3 r e *( k '- k ")- r J d 3 z C(z)e l(k - k ' ) z 



(33.18) 

= nW(0)+n 2 W 2 (k) + j 

r d 3 k' 
(2 tt) 3 J 

f ^ k ”w(k , )W*{k ,, )(2Tr) 3 S{k' k") f d 3 z ^(z)e l(k ^ k ' ) z 

(2tt ) 3 J 


(33.19) 

= nWifi) + n 2 W 2 { k) + / -—-^P(\i - k')|VF(k')| 2 . (33.20) 

J (27rU 


Thus simply by taking the square of the transform of the positions of the galaxies we have obtained 
a quantity whose expectation value is, aside in essence, a convolution of the power spectrum of the 
density field p{ r) with a kernel which is the square of the transform of the window function. We 
can identify the terms on the right hand side with a k-independent ‘shot-noise’ contribution, the 
transform of the sample volume squared, and finally a term arising from the actual fluctuations in 
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the underlying density field p{ r). The appearance of the convolving kernel is readily understood; if 
the survey has size L then we do not expect to be able to ‘resolve’ features in the power spectrum 
with Sk 1/L. If we are interested in the spectrum a spatial frequency k 3> L _1 where L is the 
size of the survey, then, since W( k) has width A k ~ 1/L, we can neglect the effect of the convolving 
kernel and we have, as a fair estimator of the power spectrum 

P(k) = |/(k )| 2 - nW{ 0) - n 2 W 2 (k). (33.21) 


To obtain this we took the transform of the galaxies, squared it, and then subtracted the ex¬ 
pectation of the power spectrum for random galaxies. This is very similar to the estimator of the 
correlation function (33.131 above. Now there is an alternative. We could have subtracted the 
power-spectrum for a random catalog of objects, suitably scaled, from /(k) before squaring. If we 
define the scaled transform of a very numerous catalog of random galaxies /(k) ran <j = a -1 Y e lkr 
then we find that 


r , ] :i V 

(l/(k) — /rand(k)| 2 ) = 1%W (0) + J — P(k - k') | W(k') | 2 ■ 


(33.22) 


This estimator has somewhat superior performance. The analog for estimating the correlation func¬ 
tion is 


10 ) 


DD - 2DR + RR 
RR 


(33.23) 


where DD and RR are as before and where DR is the number of data-random pairs. This estimator 
is somewhat less affected by sharp edges of the survey volume. 

The power-spectrum approach makes it easier to estimate the uncertainty. In inflationary mod¬ 
els, the density field is predicted to have Gaussian statistics; i.e. the different Fourier modes are 
uncorrelated. The same is true for the spectrum of the galaxies. This means that the measured 
‘raw’ power (before subtracting off the shot noise from the galaxy discreteness that is) is some posi¬ 
tive speckly function with speckle width Afc ~ 1/L. The expectation value of the measured power is 
the true power (plus the expected shot noise power), but there are fractional fluctuations about the 
mean of order unity. The fractional fluctuations in the measured power for a band of power are at 
the 1/VN level, but where N is the number of distinct Fourier modes. For a cubical survey volume, 
the spacing of such independent modes is just Afc = 2ir/L and the concept can also be made precise 
for non-cubical survey volumes. Thus, in power spectrum estimation, and assuming Gaussian initial 
conditions, error estimation is fundamentally a counting exercise. 

Comparing the two approaches, it is interesting to note that the correlation function estimator 
is relatively insensitive to the survey geometry, whereas with for the power spectrum the result is 
convolved with a window function. This becomes a serious problem for e.g. pencil beam surveys, 
where the kernel function becomes a thin disk (the transform of a needle). One consequence of 
this is that when one estimates the spectrum at some low spatial frequency, the result is dominated 
by high spatial frequency power being scattered down by the extended ‘side-lobes’ of the kernel 
function. However, for sensible survey geometries the advantage of being able to simply estimate 
the uncertainty in the power makes it the technique of choice. 

Power spectrum estimates from e.g. the SDSS and 2df surveys are now probing well into the linear 
regime. The theoretical prediction is that the power, which rises with decreasing spatial frequency 
at small scales as fc -1 - 2 , will peak at a scale corresponding to the horizon scale at z eq and will then 
decrease asymptotically as P(k) oc fc — the n = 1 Harrison-Zel’dovich spectrum — to very large 
scales. 


33.2.5 Redshift Space Distortion 

The power spectrum is a useful way to reveal a rather interesting property of the clustering pattern 
in redshift space. So far we have pretended that redshift simply measures distance. There are, 
however, peculiar velocities associated with the growth of structure which distort the distribution 
of galaxies as seen in redshift space. It is easy to understand the nature of the effect; consider an 
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overdense spherical region. According to (31.151 the Hubble rate within the region will be reduced, 
and consequently the volume of the region will appear smaller than it really is. Transverse dimensions 
are unaffected by the velocity, so the effect is that a overdense sphere will appear squashed along 
the line of sight. 

We can make this more quantitative. Consider a single plane-wave density ripple with wave 
vector k lying along the line of sight. Let the density contrast amplitude be S r (the subscript r 
denoting real-space). This density wave is generated by a sinusoidal displacement Ar = Aro cos(fcr). 
The density contrast is the derivative of the displacement: S r = dAr/dr = — kAr 0 sin(fcr), so the 
dispacement and density contrast are related by Ar = 5 r /k. Now the continuity equation tells us 
that the peculiar comoving velocity u is related to the perturbation amplitude by V ■ u = — 5, or 
u = — S/k. The physical peculiar velocity is v = au = aS/k, giving rize to a physical displacement 
in redshift space Ax = v/H and therefore a comoving displacement Ar z = Ax/a, or A r z = S/Hk. 
The ratio of the extra displacement in redshift space to the true displacement is 


A r z _ 5 
Aro ~ HS' 


(33.24) 


In an Einstein - de Sitter cosmology, the density perturbation grows as S oc a, so S/S = a/a = 
H. In this case, the extra displacement is just equal to the real displacement. This means that 
the amplitude of the density ripple in redshift space will appear twice as large as it really is. In 
a low density universe we saw that the perturbation to the Hubble rate is, for a given density 
contrast, lower. A simple, yet very good, approximation to the LI dependence of the expansion rate 
perturbation is 


_ 1 O-0.6 A P 

H 3 p 


(33.25) 


It is not hard to generalize the analysis to a plane wave of arbitrary direction k. The full effect is 
seen for waves along the line of sight whereas a wave lying transverse to the line of sight is unaffected. 
The final result for the amplitude <5 z (k) of the wave in redshift space is 


<L(k) = (l + fi°V)Mk) 


(33.26) 


where <5 r (k) is the amplitude of the wave in real-space and p is the cosine of the angle between the 
wave vector and the line of sight. The power spectrum is the expectation value of the square of the 
Fourier transfrom, so the power-spectra in real and redshift-space are related by 

P z (k) = (1 + n°V) 2 -Pr( k). (33.27) 


The redshift space distortion has been detected quite clearly in the 2df survey and appears to 
indicate S2 ~ 0.3, in accord with other estimates. The measurement of the effect is a little more 
difficult than the simple analysis would suggest. This is because non-linear structures produce a 
elongation of structures along the line-of-sight from the finger of god effect. Care must be taken 
so that this does not contaminate the linear theory squashing effect. The other main weakness of 
this probe is that we have assumed that galaxies are unbiased tracers of the mass. If galaxies are 
positively biased, the galaxy density contrast in real space is greater than that of the mass density 
(which is what drives the peculiar velocity). The result of a positive bias is to give an artifically low 
estimate of the density parameter. If we model the bias as a constant multiplier: <5 ff (r) = bS( r) then 
the redshift space power spectrum is 


P z (k) = (l + /3 M 2 ) 2 P r (k) 


(33.28) 


where 


P = 


n 0 - 6 


b 


(33.29) 
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33.2.6 Angular Clustering Surveys 

An alternative to redshift surveys is to measure the clustering from the fluctuations in the counts of 
galaxies projected onto the sky as revealed by a photometric survey. One can define a 2-dimensional 
angular correlation function w(6), and angular power spectrum P(k) much as in 3-dimensions. 
As already mentioned, photometric surveys allow one to probe much larger volumes of space than 
available using redshift surveys, but the disadvantages are that the structures tend to get washed 
out in projection, and that non-Gaussianity becomes harder to discern. We will now show how the 
2-dimensional statistics are related to their 3-dimensional analogs. In both cases we will assume that 
we are trying to measure structure on scales much less than the depth of the survey. 


Limber’s Equation 

Limber showed how to relate the angular correlation function w(9), to the 3-dimensional density 
correlation function £(r). We define w(9) such that the expectation value of the product of the 
occupation numbers for two small cells on the sky of solid angles SOi, 50,2 and separation 9 is 

(nin 2 ) = ng50i5Ll2(l + w(9)). (33.30) 

Here Tie is the mean density of galaxies on the sky. 

For simplicity we will assume that the universe can be treated as spatially flat on scales up to 
the size of the survey and we will work in Cartesian comoving cordinates r = ( x , y, z). We will also 
assume that the survey solid angle is f2 <C 1, so we can erect a 2-dimensional Cartesian co-ordinate 
system (9 X , 9 y ) on the sky also, with r = {x,y,z) ~ {z9 x , z9 yi z) to a very high precision. For an 
infinitesimal cell, and for a given density contrast field <5(r) the probability that a cell at position 6 
is occupied is 

P(ni) = 50i j dzn(z)z 2 (l + 5(z9 x ,z9y,z)). (33.31) 

Taking the average over the ensemble of density fields, and using (5) = 0, the expectation is 


(ni) = J dz n(z)z 2 (33.32) 

so the mean density on the sky is 

ng = = J dzn{z)z 2 . (33.33) 

The probability that two cells, which we take to lie at the origin and at a distance 9 along the 9 X 
axis, both be occupied is 


P(?ri,n 2 ) = {nin 2 ) = (5fii(5fi 2 


dz n(z)z 2 


dz' n(z')z ,2 (l + (<5(0,0, z)5(9z', 0, z ‘'))) 


(33.34) 


From the definition of w(9) ( 33.30| ) we obtain 

fdzf dz' n(z)z 2 n(z')z' 2 Z (^9 2 z' 2 + (z - z')'- 


■m = 


[J dz n(z)z 2 


(33.35) 


If we are measuring at a small angle 9 <C 1, and the spatial correlation function is a rapidly falling 
function, as seems be be the case, then the spatial correlation function will become very small for 
(z — z') ^ z9. In this limit, making the dz integration over the correlation function is a lot like 
integrating over a (5-function. We can replace the slowly varying function n(z)z 2 by its value at 
z = z’ and, changing integration variable we have 


^ f dz' n(z') 2 z' 4 f dz Z(V9 2 z' 2 + z 2 ) 
w{9) = - -----,- 

[/ dz n(z)z 2 Y 


(33.36) 


This is known as Limber’s equation. 
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Angular Power Spectrum 

In the same approximation, one can show that the angular power spectrum 

P 0 (k) = J d 2 Ow(0)e iK d (33.37) 

is related to the 3-dimensional power spectrum by 

Pg(n) = J dzn(z) 2 z 2 P(K/z). (33.38) 

To understand this one can think of the total projected galaxy counts as the superposition of a 
set of slabs. If we measure the transform of the counts at some wave-vector k then we will see 
contributions from all 3-dimensional plane waves which project to the approriate angular frequency; 
ie. they must have perpendicular wave-vector k± = k/z. If the slabs are each thick compared to 
the wavelength then only waves with k very nearly perpendicular to the line of sight will contribute 
appreciably; they must have fen 1/Ar, with Ar the slab thickness. This means that the power 
spectrum at wave-number k can only feel a contribution from modes with k ~ k/z. Again, if Ao is 
large compared to the scale of clustering we can take the different slabs to be effectively uncorrelated 
so we can just add the power from all the slabs. Letting the sum )P Ar ... —> f dr ... we obtain the 
result above. 


Results 


Either (33.36) or (33.37 ) can be used to predict the two-dimensional statistics from the 3-dimensional 


2-point function or power spectrum. The simple analyis here can readily be generalized to non-flat 
spatial goemetry, and one can also include evolution of £(r) or P(fc). The relation is particularly 
simple when the correlation function is a power-law £(r) oc r -7 , for which we readily find w(9) oc 0 4 ~ 7 
or w{9) oc 9~ 0 ' 8 if 7 ~ 1.8. In Fourier space, the angular power spectrum has the same spatial 
index as the 3-dimensional power spectrum. At bright magnitudes the observations agree with this 
prediction, but there seems to be some flattening of the slope at faint magnitudes (corresponding to 
galaxy redshifts on the order of unity). 

Of great interest is the form of the 2-point functions at large scales — the small-scale behaviour 
being relatively well understood — but great care must be taken to correctly detemine n(z ) since 
small-scale fluctuations arising nearby can masquerade as large-scale fluctuations at greater distance. 


Statistical Uncertainty 

Great care is needed in interpreting estimates of the angular correlation function since the corre¬ 
lations between estimates at different angular scales are highly correlated. To see why, it is again 
easiest to consider the power spectrum. As always, the angular power spectrum has a speckly form, 
with coherence scale, or speckle size, An ~ 1/0, with 0 the size of the survey. The expectation value 
of the power-spectrum is the true power, but there are relative fluctuations of order unity within 
each coherence patch. The error in w(9 ), which we will denote by Aw is driven by the fluctuations 
AP(re) about the mean value. If we compute the variance we can model the estimated power as a 
set of discrete independent values with mean value (P) = P(re) and variance ((AP) 2 ) = P 2 (re). If 
we compute the error variance in w(6) = (2n)~ 2 f d 2 KP(K.)e iK ° we find 

((A w) 2 ) ~ (Are) 4 ^^ APiAPj ~ (Are) 4 ^((AP) 2 ) ~ (Are ) 2 f d 2 n P 2 (re). (33.39) 

i j i 

Now if P(re) oc fc -1 ' 2 , as seems to be the case, then this integral is ‘infra-red divergent’; it is dominated 
by few lowest frequency speckles. The error Aw(9) produced by these dominant speckles has very 
strong long range correlations. 
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Performing the same analysis for the 3-dimensional correlation function we find 

((A£) 2 ) ~ (Ak) 6 J2J2 AP * AP i ~ ( Afc ) 3 5Z^ AP ) 2 ^ ~ ( Ak ) 3 J d 3 kP 2 (k). (33.40) 

i j i 

With P(k) oc k~ 1,2 this integral is ‘ultra-violent’ divergent; more realistically, what this means is 
that if we average the correlation function over bins of width Ar then the integral will be cut-off 
at tax ~ 1/Ar. There is still a long-range component to the errors in £, but it is smaller than 
the bin-to-bin variance. In either case, the best way to understand the statistical uncertainty is via 
power spectrum analysis, and most recent studies use this technique. 


33.3 Bulk-Flows 

Deviations from pure Hubble expansion, or bulk-flows, provide another probe of large scale structure. 
This probe exploits the linearized continuity equation S = — Vu which relates the comoving peculiar 
velocity to the growth of the density contrast c>(r). Bulk-flow studies therefore provide a direct probe 
of the mass fluctuations, and are independent of biasing. In principle, these studies can be used to 
determine the power-spectrum of the mass fluctuations. However, as we shall describe, bulk flow 
measurements can only be made for nearby galaxies, so the volume that can be probed is quite 
small and the power spectrum estimates therefore have large sample variance. Instead, what is more 
usually done is to compare the flows and the galaxy density constrast, and thereby try to determine 
the fluctuation growth rate. 


33.3.1 Measuring Bulk-Flows 


The velocity in question is the peculiar velocity; the deviation from pure Hubble flow. One very 
useful datum comes from the CMB dipole anisotropy, which measures the Earth’s peculiar velocity 
with respect to the frame in which the CMB is isotropic. Since the dipole anisotropy is much larger, 
by a factor ~ 30, than the intrinsic anisotropy, this is effectively just our peculiar velocity. This tells 
us that the earth is moving at about 300km/s. We are moving in roughly the opposite direction 
around the Milky Way at a speed of about 220km/s, so taking the vector sum we find that the 
galaxy is moving at about 500km/s relative to the cosmic rest frame. 

In principle, one can measure the line-of-sight peculiar velocities for clusters of galaxies from the 
kinematic Sunyaev Zel’dovich effect, but there is little useful information as yet. 

What has proved more useful is to measure the relative velocities of other galaxies in our neigh¬ 
bourhood, and thereby extend the measurement of the Milky Way motion to larger scales. Measuring 
bulk-flows is conceptually straightforward; we simply need to determine the distance r to the galaxy 
in question, we then take the recession velocity of the galaxy and correct this for the motion of 
the Milky Way (to obtain the recession velocity ucmb that would be measured by an observer at 
our location who happens to see no CMD dipole moment). The component of the galaxies peculiar 
velocity along the line of sight is 

Wee = «cmb - Hr. (33.41) 


The tricky part is determining accurate distances. If galaxies were ‘standard candles’ this would 
be straightforward, but galaxies have a wide range of intrinsic luminosities. There are, however, 
strong correlations between the intrinsic luminosity and distance independent measureable quantities 
such as the rotation velocity or velocity dispersion. This is not surprising; more massive galaxies have 
larger rotation velocities and might be expected to be more luminous. Spiral galaxies, for instance, 
obey a strong correlation between intrinsic luminosity and rotation velocity of the form L oc u“, 
with a ~ 4 (the best fitting slope depends on the passband in which the flux is measured). This 
is known as the Tully-Fisher relation (see figure 33.11. The rotation velocity provides an estimate 
of the intrinsic luminosity and comparing with the measured flux then gives the distance to the 
galaxy. There is a similar relation for elliptical galaxies between the luminosity and the velocity 
dispersion, known as the Faber-Jackson relation. It was subsequently realized that a more accurate 
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Figure 33.1: Illustration of the Tully-Fisher relation. Open and closed dots denote measurements of 
the logarithm of circular velocity and apparent magnitude for galaxies in two regions of space, say 
in two different clusters. There is a tight relation between m and log(u c ) in both cases, but with an 
offset. The most straightforward interpretation of this is that there is a universal relation between 
intrinsic luminosity and v c but that the clusters are at different distances. This type of measurement 
allows one to determine relative distances to distant galaxies. 


prediction for the intrinsic luminosity can be given from measurements of both the velocity dispersion 
and the surface brightness; another distance independent observable. This relation is known as the 
fundamental plane. Other methods for determining distances to galaxies include supernovae and 
surface brightness fluctuation. 

A common property of all such distance estimates is that the give distances with a constant, 
or nearly constant, fractional error. The most precise methods give fractional distance errors at 
about the 10% level. If we assume that our motion of ~ 500km/s is not atypical — as seems to be 
the case — this says that, for a single galaxy, the peculiar velocity can only be determined if the 
recession velocity is v <C 5000km/s. One can do somewhat better by averaging together distances 
for a collection of galaxies in some region of space, which are assumed to share a common peculiar 
velociuty, but the range of such methods is still quite limited. 

The earliest application of such measurements were to the local supercluster (LSC). This is a 
system whose center lies ~ lOMpc from the Milky Way and which appears to have a density contrast 
in galaxies A n/n ~ 2 within our radius. Measuring distances to, and recession velocities of, galaxies 
in this system indicates that the Hubble expansion is retarded by about 30%. This indicates a fairly 
low fi. Only part of the ~ 500km/s motion of the Milky Way can be attributed to the gravitational 
pull of the local supercluster. More accurate inspection of the flows within the LSC revealed a tidal 
shear, indicating the presence of some external mass. Once a deeper sample of elliptical galaxies 
became available it became apparent that there are several superclusters in our vicinity that produce 
a rather complicated flow pattern. 

These measurements have been combined with redshift surveys to determine O (or more generally 
P). There are two approaches that have been used. The first is to use the galaxy density field to 
compute the acceleration field and thereby predict the peculiar velocity. This is a non-local method, 
since the velocity of a galaxy may depend on quite distant mass concentrations. It is then important 
that the redshift survey have nearly full sky coverage and be sufficiently deep to encompass all of 
the important mass fluctuations. The second approach is to compute the divergence of the velocity 
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field and then compare this with the density. This is a local comparison. While it would seem that 
computing V • u would require all 3 components of the velocity field, this is not the case. The reason 
for this is that in linear perturbation theory the flow is a potential field. This means that one can 
determine the velocity potential simply by integrating the line of sight component of the velocity, 
and one can thereby reconstruct all 3 components of the velocity field from measurements of only 
one. Application of these methods, mostly using the redshift surveys derived from the IRAS satellite 
observations, indicate fairly high density parameter. How this can be reconciled with the concensus 
for low values is not yet clear. 


33.4 Microwave Background Anisotropies 

A third probe of linear structures are cosmic microwave background anisotropies. This is rather 
different from galaxy clustering and bulk flows in that the anisotropy is generated at early times. 
Small angle anisotropy, for instance, probes the state of the universe at the redshift of recombination 

Zdec =2 1000 . 

33.4.1 Recombination and the Cosmic Photosphere 

First we need to understand where the photons we are seeing originated. In the standard cosmology 
the universe was highly ionized prior to Zdec and then rather rapidly became neutral. Detailed 
calculations show that the ‘visibility function’ — this gives the distribution over redshift of last 
scattering for CMB photons — is correspondingly narrow, with width Az/z ~ 1/10. In an Einstein 
- de Sitter cosmology, the comoving distance w is related to redshift by 

w = 1- - J=. (33.42) 

VI T z 

This means that the horizon (the surface of infinite redshift) is at cu = 1, whereas the surface z = z^ec 
is at Wdec — 1 — l/'s/lOOO ~ 0.97. The cosmic photosphere is then a rather narrow fuzzy shell quite 
close to the horizon. If the universe is spatially flat then the angle subtended by the horizon size 
at decoupling is Oh, dec — 1 /-y/Zdec or about 2 degrees. If the universe is open, this angular scale is 
reduced. 


33.4.2 Large-Angle Anisotropies 


Consider first the anisotropy generated by perturbations larger than the horizon size at decoupling. 
The situation is sketched in figure |33.2| The primary effect driving large-angle anisotropy is the 
so-called Sachs-Wolfe effect. Photons which emerge from an over-dense (under-dense) region sufffer 
what is effectively a gravitational redshift (blueshift). However, for subtle reasons, the fractional 
photon energy change — which is equal to the fractional change in temperature — is one third of the 
Newtonian potential perturbation at the point of emission. The Sachs-Wolfe temperature anisotropy 


is 


AT Ad, 

~jT ~ 0<Pe 


(33.43) 


The temperature anisotropy is therefore on the order of the density perturbation amplitude at 
horizon crossing. 

In addition to the temperature anisotropy induced as the photons ‘climb out’ of the potential 
well where they originate, there is an additional source of temperature anisotropy generated as the 
photons pass through intervening inhomogeneity (photon C in figure 33.21. This effect was first 
calculated by Rees and Sciama. This effect, however, is sub-dominant. First, in a flat universe, and 
in linear theory, the effect vanishes. In an open universe there is a non-vanishing effect, but the 
amplitude is on the order AT/T ~ HR5(f >, where R is the size of the perturbation; this is smaller 
than the Sachs-Wolfe effect by the factor HR which is the size of the perturbation in units of the 
horizon size. Now the effect of multiple perturbations along the line of sight will add in quadrature, 
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Figure 33.2: Schematic illustration of the generation of large angle CMB anisotropy. Photon A 
arrives from an unperturbed region of the universe. Photon B emerges from an over-dense region. It 
suffers the ‘Sachs-Wolfe’ effect — effectively a gravitational redshift — and has slightly lower energy 
than photon A when it reaches the observer. Photon C passes through an over-dense region on its 
way form the last scattering surface, and suffers the ‘Rees-Sciama’ effect. 


resulting in a net anisotropy AT/T ~ \JHRStfi, but this is still small compared to the Sachs-Wolfe 
effect. 

The large-scale temperature anisotropy effectively provides a map of the Newtonian potential 
variation on the last scattering surface. The prediction of inflation (and also cosmic string models) is 
that the temperature fluctuation field should take the scale-invariant flicker-noise form corresponding 
to spectral index n = 1. This is just what was observed by the COBE sattelite. 

33.4.3 Small-Angle Anisotropies 

Small-scale anisotropies are more complicated. As discussed in chapter |31| prior to de-coupling, 
and on scales small compared to the sound horizon, the baryon-photon fluid undergoes acoustic 
oscillations. In such sound waves, there are adiabatic variations of the temperature AT/T ~ Ap/p 
and there are also peculiar velocities v ~ c s Ap/p. Both of these give rise to temperature anisotropy. 
Detailed calculation of small-scale anisotropy is quite calculated and must be performed numerically. 
The net result of such calculations is the angular power spectrum for the fluctuations. This is usually 
stated in terms of the mean square Legendre polynomial coefficient C'f. However, since these effects 
appear at small angle, one is free to approximate the sky as effectively flat, and one can then express 
the temperature variance in terms of the power-spectrum. The result is a spectrum of small-angle 
anisotropies with bumps and wiggles extending from the horizon scale down to the damping cut-off. 
The amplitude of the wiggles in the power spectrum are larger the larger is the baryon content. 

33.4.4 Polarization of the CMB 

In addition to the temperature anisotropy A T/T, the CMB is predicted to display polarization. 
Polarization of the CMB arises by virtue of the anisotropic nature of Thomson scattering; if an 
electron is illuminated with radiation which is anisotropic then the radiation it scatters will be 
polarized. For example, if we lie along the z-axis and observer an eletron at the spatial origin which 
is being illuminated by a lamp sited out along the y -axis then the radiation we see will be linearly 
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polarized in the ^-direction (see j |10.7 1. In general, the degree of polarization is proportonal to 
the quadrupole moment of the incident radiation. Measurement of the CMB polarization therefore 
provide us with a kind of ‘remote-sensing’ of the anisotropy of the radiation field as it was at the 
time of last scattering. 


33.5 Weak Lensing 


Another probe of large-scale structure is weak gravitational lensing. Light rays propagating to us 
through the inhomogeneous universe get tugged from side to side by mass concentrations. The 
deflection of a light ray that passes a point mass M at impact parameter b is 


fflef 


AGM 

c 2 b 


(33.44) 


This is just twice what Newtonian theory would give for the deflection of a test particle moving 
at v = c. In this picture we are imagining the radiation to be test particles being pulled by 
a gravitational acceleration. There is another useful way to look at this using wave-optics; the 
inhomogeneity of the mass distribution causes space-time to become curved. As discussed earlier, 
the space in an over-dense region is positively curved, as illustrated in figure [2877 This means that 
light rays propagating through the over-density have to propagate a slightly greater distance than 
they would in the absence of a the density perturbation. Consequently the wave-fronts get retarded 
slightly in passing through the over-density and this results in focusing of rays. There is still another 
way to picture the situation: The optical properties of a lumpy universe are, in fact, essentially 
identical to that of a block of glass of inhomogeneous density where the refractive index is 


n(r) = (1 - 2<j){r)/c 2 ) 


(33.45) 


with </>(r) the Newtonian gravitational potential. In an over-dense region, <f> is negative, so n is 
slightly greater than unity. In this picture we think of space as being flat, but that the speed of light 
is slightly retarded in the over-dense region. All three of the above pictures give identical results. 

The gravitational potential of a bound structure is on the order of 5(f> ~ o 2 , where <r v is the 
velocity dispersion or circular velocity. Now the velocity dispersion for a massive cluster of galaxies, 
for example, is o v ~ lOOOkm/s, or about 0.003 times c. One such object can cause a deflection of on 
the order of 20", and multiple objects along the line of sight would give random deflections adding 
in quadrature to give still larger deflection. Images of distant objects are therefore shifted from 
their “true” positions (i.e. the positions they would have if we could somehow switch off the gravity 
perturbation). Unfortunately, this deflection is not easily measurable, since we do not know the true 
positions. Instead, the effect exploited in weak lensing is the differential deflection — i.e. the fact 
that the deflection suffered by the light from one side of a distant galaxy is slightly different than 
the deflection for the other side. This causes a systematic distortion, or ‘shearing’, of the shapes 
of the distant galaxies. The effect is similar to the distortion of distant objects seen in a ‘mirage’, 
though is a much weaker effect. 

We can estimate the size of the effect as follows: Consider a mass concentration of size R and 
mass M at a distance D 0 i from the observer as shown schematically in figure |33.3| Now consider a 
thin conical bundle of rays with opening angle 9q emerging from the observer and propagating back 
to some distant ‘source plane’. In the absence of the lens this bundle would intersect the source plane 
in a circle of radius l = D os 9q. Now the lens will introduce some deflection j e f ~ GM/c 2 b , and there 
will also generally be some change in the deflection across the bundle of AfUef ~ GM5b/c 2 b 2 . This 
relative deflection will cause the initially circular bundle of rays to become elliptical. The length of 
the ellipse in the radial direction will be l' = l + AsA#def■ The fractional stretching of the ellipse, 
or ‘image shear’, is 


V_ _ j _ As Afldef 
l As $o 


(33.46) 


Now we can also write the differential deflection as A6Af = 0od9& e i/d9 , where d9o e f/d9 is the 
distortion tensor ; it gives the rate of change of deflection angle with position on the sky. The image 
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Figure 33.3: Schematic illustration of weak lensing. In the absence of the deflecting lens, a conical 
bindle of rays emerging from the observer will intercept the source plane in a circle. In the presence 
of the lens, the bundle will intercept the source plane in an ellipse. An object of this ellipticity on the 
source plane will therefore apprear circular. Conversely, a circular object will appear elliptical, but 
with an ellipticity of the opposite ‘sign’ (i.e. it will appear stretched along the tangential direction 
in this example). As shown in the main text, the fractional stretching of the image, also known as 
the ‘image-shear’, is 7 = V/l — 1 = ( Di s /D os )d9 de f/d8. 


shear is then 


As Alef 

As de 


(33.47) 


Now consider an over-dense lens of size R and density contrast Sp/p. The angular size of the 
lens is 6 ~ R/D 0 \. The deflection angle is 


@def 


GSM GSpR 2 
~R6 2 


The deflection angle will vary smoothly with impact parameter, so the distortion is 

dOdef Aef GSpR 2 /c 2 H 2 D 0 \R Sp 


d9 


R/ A, 


p 


(33.48) 


(33.49) 


where we have used H 2 ~ Gp. The image shear is therefore 


7 


H 2 AlAs u Sp 

C 2 As p ' 


(33.50) 


Note that the strength of the effect scales as the product of the density and the size of the object; 
i.e. it is proportional to the surface density of the lens. 

This is the effect due to a single object. This may be relevant for highly non-linear objects such 
as clusters of galaxies, where the space filling factor is small and the probability that a line of sight 
intercepts such an object is small. For large-scale structures with Sp/p <j 1 the filling factor is of 
order unity, and the net effect will be the superposition of a large number of structures along the 
line of sight. In fact, the shear is the integral of the tidal field along the line of sight. Each structure 
will give a contribution to the shear with a random direction and strength. This means that the net 
shear variance (y 2 ) will be the sum of the individual shear variances, or, equivalently, that the net 
effect will be larger than that from a single object by roughly y/~N where N ~ D os /R is the number 
of structures. The factor AlAs means that we get a small effect from structures which are very 
close to either the source or the observer, so, to get a crude estimate of the effect we can assume 
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that all of the distances are of the same order of magnitude. This gives the prediction for the root 
mean square shear: 


<7 2 > 1/2 


h 2 du 2 r i/2 


(33.51) 


This tells us that the strength of the effect increases as the 3/2 power of the distance to the source; 
therefore to detect the effect one wants to use sources at cosmological distance (this also gives a large 
number of background galaxies, with also helps). For sources at z ~ 1, the distance is D ~ c/H, so 
the strength of the effect is then 


(7 2 ) I/2 ~ 



(33.52) 


This would suggest that, for super-cluster scale structures with R ~ lOMpc and Sp/p ~ 1, the root 
mean square shear would be on the order of 5%. More careful estimates — putting in factors like 47 r 
and geometric factors properly — gives a prediction for shear of order 1 % on scales of one degree. 

This is a very small effect. A 1% shear means that a circular object will appear elliptical, mith 
major axis about 1% larger than the minor axis. However, galaxies are already elliptical, with root 
mean square ellipticity of about 30%, so the effect cannot be detected for a single object. What makes 
the effect measureable is that the shear is spatially coherent. The effect from super-cluster and larger 
scale structure will be coherent over large angular scales. Now the number of background galaxies 
— the ‘cosmic wallpaper’ - becomes very large; one can readily detect i> 10 5 galaxies per square 
degree. By looking for a statistical tendency for the galaxy position angles to be anisotropically 
distributed one can measure shears on the order of ~ 0.3/ v 7 A r gal, which is comfortably smaller than 
the prediction on degree scales. This is just the statistical error; to keep the systematic errors below 
this requires very careful analysis of the images. Current measurements find consistent results on 
scales of <; 10 ', and here is little data on larger scales as yet, but several large-scale surveys are being 
carried out, so the outlook is promising. 



Chapter 34 


Non-Linear Cosmological Structure 


In chapter [31] we explored the evolution of small amplitude perturbations of otherwise homogeneous 
cosmological models. This provides an accurate description of the evolution of structure from very 
early times. On sufficiently large scales, the structure is still in the linear regime today, but small 
scale structures have reached the point where Sp/p i> 1 and have gone non-linear. 

When dealing with the development of non-linear structure we can usually neglect radiation 
pressure and assume that the structures are much smaller than the horizon scale, so a Newtonian 
treatment is valid. However, the equations of motion are still relatively complicated and it is hard 
to find exact solutions except in highly idealized models such as spherical or planar 1-dimensional 
collapse. One approach to non-linear structure growth is to attempt to evolve the initial conditions 
forward from the linear regime numerically using either N-body simulations, to evolve the collisionless 
Boltzmann equation, or hydro-dynamical simulations to evolve the Euler, energy and continuity 
equations. The former is adequate to describe the evolution of collisionless dark matter matter, but 
the latter is required if one also wants to treat the baryonic matter. Another possibility is to extend 
perturbation theory beyond linear order. This is an area where there has been much activity by 
theorists in recent years. These calculations typically assume a Gaussian initial density field, and 
then compute the emergence of non-Gaussianity, e.g. the skewness, or the kurtosis of the density 
distribution. Such results are limited to the ‘quasi-linear’ regime; i.e. density contrasts <5^1. This 
is a rather limited range of validity. Also, since most interest is in theories with ‘hierarchical’ initial 
fluctuation spectra, when one scale is just going non-linear, there are smaller scale structures which 
will be highly non-linear. Usually such calculations deal with this by assuming some smoothing of 
the initial (5-field, but the validity of this is questionable. 

Here I shall describe a number of approximate methods and models that directly address the 
‘quite-strongly non-linear’ regime. These models are typically quite idealized, but they are still 
useful as they provide insight into the way structure has evolved, and is evolving today. 


34.1 Spherical Collapse Model 


A simple model for the non-linear evolution of an initially positive density fluctuation is the ‘top-hat 
model’ in which there is an initial spherical over-density, with constant <5 = Sp/p. We have already 
analyzed this in the linear regime. The non-linear evolution of such a perturbation is illustrated in 
figure |34.1| 

The inner radius obeys 


with ‘cycloidal’ solutions 


R 2 = 2 GM/R - 2 E 0 

(34.1) 

CrM 

R ~ 2E 0 ^ COS ^>> 

(34.2) 

OM 

(2E 0 )3/2 (7? Jn(7/)) 

(34.3) 
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Figure 34.1: In the spherical ‘top-hat’ perturbation we excise some sphere of matter and replace 
it with a smaller concentric uniform density sphere. The upper line depicts the expansion of the 
exterior mass shell. The interior, assumed here to be gravitationally bound, behaves like part of 
a closed FRW model (lower curve). If the interior is precisely uniform it will collapse to a final 
singularity and form a black hole. However, for perturbations much smaller than the horizon scale, 
the specific gravitational binding energy at the point of maximum expansion is small, and a huge 
collapse factor is required to form a black-hole. More realistically, any initial irregularity or angular 
momentum will cause the collapse to ‘bounce’, and we expect the system to then settle down to 
an equilibrated, or virialized, state with final radius roughly half that of the sphere at the point of 
maximum expansion. 


The inner radius peaks at the ‘turn-around time’ t turn = ttGM(2E 0 ) 3 / 2 when the density is 


3 M _ 3tt 
P ~ 47rf?3 nax - 32 GU' 

Compare this with the density of an Einstein - de Sitter (fc = 0, 0 = 1) background 


(34.4) 


P = 


1 

67rt 2 


In such a background, the over-dense perturbation will turn around with density contrast 


97T 2 

TfT 


5.55. 


(34.5) 


(34.6) 


A perfectly spherical over-density would collapse to infinite density at 2< t urn and would form a 
black hole. In the more realistic case we would expect a collapsing ‘blob’ to become increasingly 
aspherical and virialize after contracting by about a factor 2 in radius. What we mean by virialization 
is reaching a state where the radius is no longer contracting as in the collapse phase. While only a 
rough guide to the real situation, this suggests that the system will virialize with a density ~ 8 times 
larger than at turnaround. Now the background cosmology is expanding with scale-factor a oc t 2 / 3 , 
so in the interval t turn < t < 2f tU rn the background will have expanded by a factor 2 2 / 3 and will 
therefore have decreased in density by a factor 4, giving a density contrast at virialization of 


= 18tt 2 ~ 180. 


virial 


(34.7) 
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Figure 34.2: Illustration of the kind of divergence free flow pattern induced by a point mass. The 
peculiar velocity falls off as 6v oc 1/H 2 , with the consequence that the flux of matter through any 
a shell of radius R is independent of R. For R > 0 the density remains unperturbed, but there is a 
net accumulation of mass at the origin. 

This is for a perturbation of an Einstein - cle Sitter background, which may not be realistic at very 
late times, but the analysis is readily generalized to open models, since the only thing the interior 
‘knows’ about the background cosmology is the time since the big-bang. 

The perturbation conserves energy, so the final virial velocity a v is related to the initial pertur¬ 
bation to the Newtonian potential perturbation S(f> by 

4 ~ H- (34.8) 

Perhaps more interestingly, we can estimate the velocity dispersion for a recently virialized object 
of a given size, or vice versa. For an object with an over-density of 180, the circular velocity is 

9 GM 4 A. 9 99 , 

v 2 irc = —— ~ 180 x -irGpR 2 ~ 90 H 2 R 2 . (34.9) 

K 3 

The line-of-sight velocity dispersion is tx,„ ~ v 2 ilc /2, so the velocity dispersion and radius (at density 
contrast 180) is 

Ri80~jj^- (34.10) 

For a rich cluster of galaxies like the Coma cluster, which has o v ~ lOOOkm/s, this gives i? 18 o — 
1.5ft. -1 Mpc. While clusters do not have sharp edges — there being matter in-falling at greater 
distances and denser material in the center which collapsed in the past — it is gratifying that this 
estimate of the size of a cluster agrees very nicely with the size that George Abell assigned to such 
objects. 

34.2 Gunn-Gott Spherical Accretion Model 

Another very illuminating model is that of Gunn and Gott (19??) who considered what happens if 
one introduces a point-like ‘seed’ of mass Mq into an otherwise uniform Einstein -de Sitter universe. 
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First consider the linear theory. At large radii, the mass will induce a peculiar acceleration at 
physical distance R of 

9=^r- (34.11) 


R 2 ' 

Acting over a Hubble time t ~ 1/H this will generate a peculiar infall velocity 


Sv ~ gt 


GM 0 
HR 2 ' 


(34.12) 


This kind of Sv oc 1/R 2 flow (see figure 34.21 is ‘divergence free’, so there is no change in the density 
at large distances (think of two concentric comoving shells; the flux of matter across a surface is the 
velocity times the area and is independent of radius, so there is no build up except at the center). 
The am ount of mass convected across a shell in one Hubble time is 5M ~ pR 2 Svt which, with 
(34.121 and H 2 = 87rGp/3, gives SM ~ Mq. Thus the seed induces, after one expansion time, a 
growing mode density perturbation ( 5p/p)i ~ Mo/M , and this subsequently grows with time as 
Sp/p = 8M(t)/M oc a(t). This is assuming, for simplicity, an Einstein - de Sitter background. 

The amount of mass accumulated in the center is therefore 


SM(t) 


Mo — oc a(t). 

CLi 


This mass represents a density contrast of order unity at a physical radius R such that SM 
and slightly inside will lie the turnaround radius 


(34.13) 
pR\ 


Rt 


(SM/p) 1 / 3 oc a 4 / 3 . 


(34.14) 


As time goes on, progressively larger shells will turn around, collapse and virialize in some compli¬ 
cated way with shell crossing etc. However, we may reasonably expect that the final specific binding 
energy of a shell of a certain mass will be equal, modulo some factor of order unity, to its initial 
specific binding energy 6(j>. Now the initial binding energy is a power law in radius: 

5(j) ~ GMo/R oc 1/R oc M~ l > 3 , (34.15) 


whereas the final binding energy as a function of the final radius Rf is 

5(j) ~ GM(R f )/R f (34.16) 

where AI(Rf) is the mass within radius Rf. Equating these gives the scaling law 

M{R f ) oc i? 3/4 . (34.17) 

If the mass within radius Rf is a power law in Rf then so also is the density: 

p{R f ) ~ M{R f )/R) oc i?7 9/4 . (34.18) 

This analysis then tells us that the virialized system should have a power law density profile. What 
is interesting about this result is that it is very close to the p(R) ~ R~ 2 density run for a flat rotation 
curve halo, and also similar to the profile of clusters of galaxies, which are also often modeled as 
‘isothermal spheres’. 


34.3 The ZePdovich Approximation 

In linear theory, and for growing perturbations in an Einstein - de Sitter model, particles move with 
peculiar velocity 

v (r,i) = (V*o) 1 / 3 v 0 (r) (34.19) 

where now r is a comoving spatial coordinate and v 0 is the peculiar velocity at some initial time. 
This says that the peculiar velocity field just grows with time at the same rate v oc A 3 at all points 
in space. 
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The physical displacement of a particle in time dt is dx = vdt, so the comoving displacement is 



a 


—dt = u dt 
a 


(34.20) 


where the comoving peculiar velocity is u 
scale factor a is then 


= v/a. The rate of change of comoving position with 


dr v dt 
da a da 1 


(34.21) 


but with v oc f 1 / 3 and a oc t 2 / 3 , so da/dt oc t 1 / 3 , this says that 


dr n 
— oc t° 
da 


(34.22) 


Therefore, if we define a new ‘time’ toco, then the particles move ballistically in comoving coordinate 
space: dr/dr = constant. Zel’dovich’s approximation is to assume that this ballistic motion continues 
into the non-linear regime. 

The result is a Lagrangian mapping resulting in formation of caustics, or surfaces of infinite 
density. This is very analogous to the formation of caustics on the swimming pool floor, which we 
explored in our study of geometric optics in chapter [8] There the horizontal deflection of the rays — 
a 2-dimensional vector displacement — increases linearly with distance from the surface, and here 
the 3-dimensional comoving displacement increases linearly with ‘time’ r. We can write the actual 
comoving position or Eulerian coordinate x as a function of the initial or Lagrangian coordinate r 
as 

x(r) = r + rU(r) (34.23) 


where U is a suitably scaled version of u. 
Until caustics form, the density is 


p oc 



(34.24) 


where \dx./dr\ is the Jacobian of the transformation from Lagrangian to Eulerian coordinate. This 
is just conservation of mass: dM = p^d?r = psd^x, with p e and p l the densities in Eulerian and 
Lagrangian space respectively. Now from (34.231, dxi/drj = 6ij + t dUi/drj, so we can also write 
the density as 

9 * (l + rA 1 )(l + rA 2 )(l + rA 3 ) (34 ' 25) 


where the A.; are the eigenvalues of the deformation tensor 4>y = dUi/drj. 

If there is a negative eigenvalue Ai < A 2 ,A 3 , then the density of a small comoving volume of 
matter will become infinite with collapse along the appropriate principle axis when r = — 1/Ai. 
Pancakes — or perhaps we should call then blinis — form with a multi-stream region sandwiched 
between the caustic surfaces. These pancakes grow rapidly and intersect to form a cellular network 
of walls or pancakes intersecting in lines with the matter in these lines draining into the nodes where 
all three of the eigenvalues of dUi/drj go negative. 

The Zel’dovich approximation seems to give a good picture of formation of structure in the HDM 
model, but continued unaccelerated motion of particles after shell crossing is clearly unrealistic. A 
useful modification of Zel’dovich’s approximation is to assume that particles move ballistically until 
shell-crossing, at which point they stick together. This is described by Burger’s equation, and gives 
infinitesimally thin walls. This is also obviously unrealistic, but actually receives some justification if 
we think of the Universal expansion adiabatically stretching a self-gravitating sheet. If the thickness 
of the sheet is T and the surface density E, then the acceleration of a particle at the surface is 
f ~ G E, and the frequency of oscillation of particles through the sheet is 
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Now the surface density decreases as E oc 1/a 2 for a sheet expanding in the transverse direction at 
the Hubble rate, so applying the law of adiabatic invariance r = Acos(ujt) with amplitude A oc I/y/u> 
and requiring d~Twe find that the (physical) thickness must evolve as 

T oc a 2 / 3 . (34.27) 

This increases with time, but not as fast as the scale factor a(t), so in comoving coordinates the 
sheet should indeed become thin. 

Another nice feature of the Zel’dovich approximation is that one can compute the non-linear 
power spectrum analytically in terms of the initial power spectrum, as described in detail in chapter 

H 


34.4 Press-Schechter Mass Function 


The Press-Schechter approximation is designed for hierarchical type initial fluctuation fields. It 
provides one with a useful approximation for the differential mass function n(M) = dN(> M)/dM , 
where the cumulative mass function 7V(> M) is the comoving number density of bound structures 
with mass > M. 

The idea is that one identify two quantities: The first is the fraction of space where the initial 
density contrast field <5(r), when filtered with a kernel of mass M, lies above the threshold <5 C rit for 
formation of non-linear condensations. 


nO O 7 

f(8>6 clit ;M)= “t== exp(— j/ 2 /2). 


IS clit /a(M) '/Tk 

The second is the fraction of mass in objects more massive than M 


/»oo 

/(> M) = / dMMn(M). 

Jm 


Differentiating (34.281 and (34.291 with respect to M and equating we get 


= ^critda(Af)/dM 2 2 

V ' V2nMa 2 {M) V crlt/ V ” 


(34.28) 


(34.29) 


(34.30) 


While hard to justify rigorously, the idea obviously contains an element of truth, and moreover seems 
to give predictions which agree with the results of N-body experiments. 

If one assumes a power-law spectrum P{k) oc k n then the variance as a function of smoothing 
mass M is also a power law, cx 2 (M) oc AI~^ n+3 ^ 3 . In ‘hierarchical models’ (those with n > —3) the 
mass variance increases with decreasing mass. At sufficiently low masses we must have <r <5 cr ; t 
and the exponential factor becomes close to unity and the theory predicts a power-law differential 
mass function. For n = —2, for instance, which is the slope of the CDM spectrum around the mass 
scale of galaxies the theory predicts n(M) oc M -5 / 6 . At high masses, when S cr it/u(M) starts to 
exceed unity the exponential factor becomes very small. The general prediction is for a power-law 
mass function which becomes exponentially cut off above some characteristic mass scale; the mass 
M* where er(M*) ~ 1. This is just the kind of behavior seen in the galaxy luminosity function and 
also in the cluster mass function. 


34.5 Biased Clustering 

In the Press-Schechter theory, collapsed objects are associated with regions where the initial over¬ 
density, smoothed on an appropriate mass-scale, is sufficiently large. A consequence of this is that 
objects on the high end of the mass function — those with M M* that is — will tend to have 
amplified large large-scale clustering properties. Their clustering is said to be positively biased. The 
effect is illustrated in figure [3473] which shows how the density of over-dense regions is modulated by 
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long wavelength modes of the density field. This effect is fairly obvious, but what is less obvious is 
how the strength of the modulation increases as one raises the threshold. This is consequence of the 
peculiar property of a Gaussian distribution. The Gaussian distribution is P{v) oc exp(— i/ 2 /2). In 
the vicinity of some value v = Uq the distribution for Av = v — ivq is P(Ais) oc exp(— (vq + Av) 2 /2). 
Expanding the quadratic factor in the exponential, and assuming vq Av gives 

P(Au) ~ exp(— i/ 0 Av). (34.31) 

Thus a Gaussian looks locally exponential: P(Av) ~ exp(— Av/ctav) with exponential scale length 
tr a v = l/i'o which decreases with increasing vq. Thus the further out we go on the tail of a Gaussian 
the steeper the distribution becomes. 

If we add a positive background field 5b, the fractional change in the probability to exceed the 
threshold is then A P/P — v^db/cr. The fluctuation in the number density of upward excursions is 
then 

S n 

1 + — = 1 + b5 b (34.32) 

n 

where the bias factor is 

b= ^o = ^n (34.33) 

O <7 1 

Since 5 rmcr it is constant here, the bias factor rapidly increases with the mass of the objects (because 
<t 2 (M) decreases with increasing mass). This is the linearized bias; valid for very small 5b, such that 
bSb <C 1. It is not difficult to show that for 5b ^ 1, the density of upward fluctuations is proportional 
to exp (bSb). Thus the density of objects is the exponential of the background field. 

One solid application of this theory is to clusters of galaxies; these are the most massive gravi¬ 
tationally collapsed objects, and so are naturally identified with particularly high peaks. For a long 
time, the very strong clustering of such objects was a puzzle; they have a correlation length of about 
20Mpc as compared to about 5Mpc for galaxies. Now we understand that this is just about what 
one would expect given Gaussian initial density fluctuations. It is tempting to apply this theory also 
to galaxies, but there the connection between theory and observation is more tenuous. However, 
at high redshift one would expect the rare, most massive galaxies to be the analog of very massive 
clusters today, and this theory then provides a natural explanation for the rather strong clustering 
of ‘Lyman-break’ galaxies at z ~ 3. 


34.6 Self-Similar Clustering 

Another useful approximation for ‘hierarchical’ initial conditions is the self-similar evolution model. 
The idea is that if the initial spectrum of fluctuations approximates a power law over some range of 
wave-number then the structure should evolve in such a way that the non-linear density field at one 
time is a scaled replica of the field at another time, where the scaling factor in mass say is given by 
M*(f); the nominal mass going non-linear. 

For a power law spectrum P(k) oc k n , the mass variance scales as tr 2 ~ (fc 3 P(fc))fe = \/ r oc r -( n + 3 ). 
The root mean squared mass fluctuations grow with time in proportion to the scale factor a(f), and 
therefore a scales with Mocr 3 and a as 

a(M,t) oc aM- (n+3)/6 . (34.34) 

The mass-scale of non-linearity (a ~ 1) therefore scales as 

M* oc a 6/(n+3) (34.35) 

and the scaling law is that the fraction of mass per log interval of mass is a function only of M 


M 2 n(M, t) = F(M/M*(t)) 


(34.36) 


(see figure (34.4)). This relation does not specify anything about the universal function F(y ), but 
it does allow one to predict the mass function at redshift z > 0 given the form at z = 0. 
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x 



x 

Figure 34.3: The upper panel shows a realization of a Gaussian random noise field. This is supposed 
to represent the initial Gaussian density perturbation field <5(r). The horizontal dashed line is 
supposed to represent the threshold density required in order for a region to have collapsed. The 
lower trace shows the ‘excursion set’ for this threshold (here taken to be 1.8 times the root mean 
squared fluctuation. This function is one or zero depending on whether f(x) exceeds the threshold. 
The positive parts of the excursion set are randomly distributed with position. The lower panel shows 
the same thing, but where we have added a long-wavelength sinusoidal ‘background’ field. Clearly, 
and not surprisingly, the background field has modulated the density of the regions exceeding the 
critical threshold. 
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Figure 34.4: If the initial spectrum of density fluctuations has a power-law form, the differential 
mass function n(M) evolves such that M 2 n(M,t) is a universal function F(M/M*(i)). Here M*(i) 
is the mass going non-linear at time t. Self-similarity does not tell one the form of the universal 
function F ( y ) — sketched here as a Schecteresque function — but it does enable one to retrodict the 
mass function at earlier times from observations of the current mass function. Since the evolution 
of the characteristic mass-scale M*(t) depends on the initial spectral index, this provides a useful 
test of cosmological theory. 


This model provides a simple, but apparently quite accurate, model for the evolution of structures 
in CDM-like models where the slope of the spectrum varies quite gradually with mass. The models 
are parameterized by n, the effective slope of the power spectrum on mass scales of interest. For 
CDM this varies from n ~ —1 on the scale of clusters to n ~ —2 on the scale of galaxies. 

The scaling makes no assumption about the statistical nature of the density field and is applicable 
to e.g. the cosmic string model for structure formation. 


34.7 Davis and Peebles Scaling Solution 


The goes beyond the self-similar scaling and attempts to determine the slope of the two-point 
correlation function in the non-linear regime from the slope n of the initial power spectrum, assumed 
to be power-law like with P(k) oc k n . 

The original discussion was couched in terms of the BBGKY hierarchy, but the essential result 
can be easily obtained from conservation of energy considerations, much as we did for the accretion 
onto a point mass. 

With the initial spectrum for the density fluctuations (5 and with X/ 2 (j> = 47r Gp6, so 54>k = 
4irGp6k/k 2 the root mean square potential fluctuations on scale r are 


(H 


, 2\ 1 /2 


d 6 kk- A k n W r {k) 


1/2 


(34.37) 


where W r (k) is the transform of the smoothing kernel, which falls rapidly for k 1/r. This gives 

(<5</> 2 ) 3/2 oc r (1 " n)/2 . (34.38) 

In terms of mass scale, M oc r 1 / 3 this is 

oc M (1_n ^ 6 . (34.39) 

One the other hand, in the non-linear regime, we have a power-law mass auto-correlation function 
£(r) oc r -7 . Now imagine the mass distribution to be a set of randomly distributed clumps of 
size r and over-density (5* 1. The fraction of space occupied by the clumps is / ~ l/<5*, so the 

density fluctuation variance is £(r) ~ (S 2 ) ~ fd 2 ~ (5*. The mass of a lump is M ~ p<5*r 3 , so the 
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characteristic mass of clumps of size r is M oc r 3 7 . The binding energy of clumps then scales with 
their radius and mass as 


(5(/)~M/rocr 2 7 oc A I^ 2 7 ^/^ 3 7 ). 


Equating (34.39) and (34.401, we obtain the relation 


(34.40) 


9 + 3n 
5 + n 


(34.41) 


which would fit with the empirically observed slope 7 ~ 1.8 for a white noise spectrum n = 0 . 

While the derivation here is similar to that for spherical accretion, the result is much less robust. 
While it makes perfect sense to say that the binding energy of structures when they first form is 
given, within a geometrical factor of order unity, by the initial binding energy, the calculation here 
assumes that even when much larger mass objects have collapsed, the small clumps still preserve 
the binding energy with which they are born. This is not likely to be the case, as there will be 
transfer of energy between the different scales of the hierarchy. As we have argued above, entropy 
considerations suggest that such interactions will tend to erase sub-structure. Numerical simulations 
do not provide much support for this theory. 


34.8 Cosmic Virial Theorem 

The cosmic virial theorem (Davis and Peebles again) attempts to relate the low order correlation 
functions for galaxies to the relative motions of galaxies and thereby obtain an estimate of the 
mass-to-light ratio of mass clustered along with galaxies. 

In essence, their argument is as follows: Assume that galaxies cluster like the mass — this means 

r 

that the excess mass within distance r of a galaxy grows like M oc f d 3 r £(r) oc r 3 7 . The potential 
well depth is then 6<j> ~ GM/r oc r 2-7 . One would expect the relative velocity of galaxies at 
separation r to scale as 

cr 2 (r) oc r 2-7 ~ r 0 ' 2 (34.42) 

This prediction seems to be remarkably well obeyed on scales from a few tens of kpc out to about 
1 Mpc (and one would not expect the result to hold at larger separations where things have yet to 
stabilize anyway). 

From the size of the peculiar motions, one infers that the mass-to-light ratio of material clustered 
around galaxies on scale ~ 1 Mpc or less is M/L ~ 300ft in solar units. If representative of 
the universal value, this would imply II — 0.2. This is similar to the mass-to-light ratio from 
virial analysis of individual clusters of galaxies, and provides strong supporting evidence for copious 
amounts of dark matter. It also supports the hypothesis that the galaxies cluster like the mass, 
and therefore that the universal density parameter is O ~ 0.2 rather than the aesthetically pleasing 

n = 1. 
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Appendix A 

Vector Calculus 


A.l Vectors 

The prototype vector is the distance between two points in space 

d = (d x ,d y ,d z ) = di. (A.l) 

A three component entity v t is a vector if it transforms under rotations in the same manner as di. 


A.2 Vector Products 


The scalar product or dot product of two vectors is 


3 

a • b = a.j&i = dibi 
The cross product or outer product is 



x y z 


CLyb z ct z b x 

c = a x b = 

d x dy d z 

= 

d z b x - a x b z 


b x by b z 


dx^y dyb x 


A.3 Div, Grad and Curl 

The gradient operator is 

v = (d x ,d y ,d z ) = di 

with di = d/dxi. 

The gradient of a scalar field /(r) is a vector 

V/ = dj. 


The divergence of a vector field v(r) is a scalar 


V • v = 


dv x dv v dv z \ 
dx ’ dy ’ dz ) 


dvj 
dxi' 


(A.2) 


(A.3) 


(A.4) 


(A.5) 


(A.6) 


The Laplacian operator is V 2 = V • V and yields a scalar when applied to a scalar and a vector 
when applied to a vector. 

The Laplacian of a spherically symmetric function /(r) is 


1 <t(e j) 

2 dr 


(A.7) 
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The curl of a vector field v(r) is a vector 



x y z 


W 

u 

CO 

1 

CO 

V x v = 

dx dy d z 

= 

d z v x ~ d x v z 


V X Uy V Z 


^ x Vy dyV x 


The curl of the curl of a vector field is 

Vx (Vxv) = V(V-v)-V 2 v 
The divergence of the curl of a vector field vanishes 

V • (V x v) = 0 

The divergence of the cross-product of two vector fields is 

V • (A x B) = B • (V x A) - A • (V x B) 
Another identity which is useful in fluid dynamics is 

(u • V)u = ^Vit 2 — u x (V x u). 


(A.8) 

(A.9) 

(A.10) 

(AH) 

(A.12) 


The above vector identities are all readily verified by writing out the cross-products using the 
determinant form for the cross product. 


A.4 The Divergence Theorem 

The divergence theorem says that the volume integral of the divergence of a vector field v(r) is equal 
to the integral over the surface of the volume of the normal component of v: 


J d 3 r V • v = 


dA • v 


v 


s 


and can be proven by integrating by parts. 


(A.13) 


A.5 Stokes’ Theorem 


The integral of the normal component of the curl of a vector field v over a surface is equal to the 
loop integral of the tangential component of the field around the perimeter: 


dA ■ V x v = 


dl • v. 


(A-14) 


A.6 Problems 

A.6.1 Vector Calculus Identities 

The cross product of two vectors a, b can be expressed as the determinant of the matrix 


c = a x b = 

X 

a x 

y z 

CLy d Z 

_ 

ctyb z - a z b y 

a z b x a x b z 

(A.15) 


bx 

by b z 


& xby Q'yb x 



and the same is true if the components of the vector a are differential operators: a = V = {d x ,d y d z ). 
Use this result to show that: 
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1. The curl of the curl of a vector field is 

V x (V x v) = V(V • v) — V 2 v 

2. The divergence of the curl of a vector field vanishes 

V • (V x v) = 0 

3. The divergence of the cross-product of two vector fields is 

V • (A x B) = B • (V x A) — A • (V x B) 

4. and finally that 

(u • V)u = ^Vm 2 + u x (V x u). 


(A.16) 

(A.17) 

(A-18) 

(A-19) 
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Appendix B 


Fourier Transforms 

B.l Discrete Fourier Transform 

Consider a set of N uniformly spaced samples of a 1-dimensional scalar function F x , X = 0, N — 1. 
Define the discrete Fourier transform as 


JV-l 

F K =Y F x e i27lKX/N 
x=o 


(B.l) 


Now consider the function 


N-l 

G x = Y F K e~ i2 ^ KX/N 

K =0 


substituting for F x from (B.l I this becomes 


(B.2) 


N-l N-l 

g x = y Fx ' E f? 27rK{x '- x)/N 

X'=0 K =0 

The second sum here is a simple geometric series, with value 


N-l 

E 

K =0 


p i27rK(X' — X) /N _ 


1 _ e i2ir(X'-X) 
l _ e i2ir(X'-X)/N 


(B.3) 


(B.4) 


the numerator here is zero for all X' — X (since X' — X is an integer) whereas the denominator is 
finite unless X' — X is a multiple of N. For X, X' in the range 0, N — 1 then the sum vanishes 
unless X' — X = 0, in which case it has value N , so we have 


N-l 


Y, e i2 ” KX ' N = NS X 


(B.5) 


K =0 


where Sx is the discrete delta-function which is unity or zero if X is zero or non-zero respectively. 
It follows from (B.31 that Gx = XF x , from which we obtain the inverse discrete Fourier transform 

N-l 


Fx = 4 E Fkc 


-12-kKX/N 


N 


(B.6) 


K =0 


Equations (B.l I, (B .61 allow one to reversibly transform from real-space to Fourier-space represen¬ 
tation and vice versa. 
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B.2 Continuous Fourier Transform 


We can translate the foregoing results for discrete transforms by converting sums to integrals. Let 
the real-space domain have length L so the real-space coordinate is x = XAx, where the ‘pixel’ size 
is Ax = L/N , and define f{x) = F x=x /Ax- Similarly define the continuous wave number k = KAk 


with A k = 27 t/L and set f(k) = AxF x =k/Ak to obtain from (B.l) 


N -1 


/O) = H Ax f( x ) e 


ikx 


dr /( x)e 


ikx 


X=0 


(B.7) 


and the discrete inverse transform becomes 

N -1 


f( x ) = 


1 

NAkAx 


£ A kf(k)e~ ik * 


K =0 


dk 

2ir 


f{k)e 


— ikx 


(B.8) 


it is also of interest to convert the expression for the discrete 5-function to a continuous integral: 

JKX/N = = 'NX (B.9) 


dk e ikx ^ ^ e iKX/N _ NAk x _ 5 X 


2n 


2n 


k =o 


27T 


A a; 


The final expression here has value 1/Ax if we are in the zeroth pixel X = 0 or equivalently if 
0 < x < Ax so in the continuum limit Ax —> 0 this is a representation of the delta function and we 
have 

/ dk 

^ e lkx = 8{x) (B.10) 

where the 1 Dirac 8-function’ 8(x) has the property that for any function h(x) 


/da'/>(*'¥(*-*') = <>(*) 


(B.ll) 


Note that if a is held constant inside an integral 

J dy f(y)S(ciy) = J ^ f(z/a)S(z ) = /(0)/a = J dyf(y)5(y)/a (B.12) 

thus 

8(ay) = 8{y)/a (B.13) 


which is a useful result. 

The generalization to M-dimensional space is straightforward and we have 


and 


f(k) = J d M r f{x)e lk ' x 
f( x ) = Sl£w Rk)e~ i% 


f d M k iS . 

J (2ir) M 6 


8{x) 


Note that if f{x) is real then f(—k) = f(k)*. 


(B-14) 


(B.15) 
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B.3 Parseval’s Theorem 


We now derive Parseval’s theorem for discrete functions and then for continuous functions. Consider 
the sum of the squared values of a discrete function Fx- Invoking (B .6 > we can write this as 


E f Y = EsE F K e-^ XK / N ± £ F* K , e ^xK'/N 

= & £ £/*/£, £ e-^ K - K ')l N (B.16) 

= it J2 J2 f k f^,5k- K ' = if £ \f k \ 2 

and for continuous functions we have 

f dx f 2 (x) = fdxf §f(k)e- tkx f %f[k')*e ikx 

= 5 f / f dx e ^ k ~ k '> (B.17) 

= i i i €mm*2*5(k *) = / fi/wi 2 


B.4 Convolution Theorem 

We define the convolution of two continuous functions f{x), g{x) to be 

c{x) = J dx 1 f(x')g(x — x') (B.18) 


the transform of which is 


c(fc) = / dx c{x)e lkx = J dx J dx' f(x')g(x — x')e lkx 
= J dx f dx' f % f f(k')g{k") e i( kx - k ' x '- k "( x - x ' }) 

= / f(k')g(k") jdx e i( ~ k ~ k "l x f dx' e i( - k "- k ') x ' (B.19) 

= / dk’ J dk" f(k')g(k")5(k - k")5{k" - k') 

= mm 


so the transform of a convolution of two functions is the product of the individual transforms. A 
direct corollary is that the transform of a product is the convolution of the individual transforms. 


B.5 Wiener-Khinchin Theorem 

The auto-correlation of a function / is 

£ 0*0 = H dx ' + x ) 

= ifdx' f §f(k)e~ ikx ' f fif{k')e~ ik ' {x+xl) 

= if §f ^f( k )f( k ') e ~ ik '*f dx' e -^ k+k >' (B.20) 

= i I § / ^mf(k')e- ik ’ x 2n5(k + kf) 

= if f f(-k)f(k)e~ ikx = J §P(k)e- ikx 

where we have used f(—k) = f*(k) (since f(x) is real) and where P(k) = \f(k)\ 2 /L is the power 
spectrum. Thus £(x) is the FT of P(k) and vice versa, which is the Wiener-Khinchin theorem. Note 
that for a statistically homogeneous random field £(x) and P(k) tend in an average sense to a limit 
which is independent of the sampling box size L. 


B.6 Fourier Transforms of Derivatives and Integrals 

if 


dx 


(B.21) 
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then by equation (B .8 > we have 


/gF(fc)e 


— ik.: 


= £fgme 

= s§m 


— ik.: 


de 


= f§ (~ikf(k))e 


dx 

— ik.: 


(B.22) 


so the transform of the derivative of a function is — ik times the transform of the function: 


F(k) = —ikf(k) (B.23) 

One can also obtain this result by integration by parts. 

The transform of the integral of a function is similarly equal to the transform of that function 
divided by —ik. This is ill-defined for k = 0. This reflects the fact that an integral is determined 
only up to an additive constant, and the k = 0 term in a Fourier expansion is just this constant. 


B.7 Fourier Shift Theorem 

The Fourier transform of a shifted field fix') = f(x + d) is given by 

f'(k) = e ikd f(k). (B.24) 

B.8 Utility of Fourier Transforms 

One use for Fourier transforms in astrophysics is to convert differential equations to algebraic equa¬ 
tions; the ‘grad’ operator V, becoming multiplication by — ?k in the Fourier domain. 

Fourier transforms are a great help in describing random processes — and especially statistically 
homogeneous random processes. 

Fourier transforms are computationally very useful for convolving or de-convolving data. Say you 
want to convolve a IV x N image with some extended smoothing kernel. At face value this would 
appear to take ON 4 operations (for each of the N 2 ‘destination image’ pixels you need to sum over 
N 2 ‘source image’ pixels). The operation count is smaller for a smaller kernel, but still extremely 
expensive for large images. The great advantage of performing such smoothing in Fourier transform 
space is that the fast Fourier transform algorithm requires only ON 2 log(N 2 ) operations. Thus, the 
preferred method to convolve an image with a kernel is to transform both with the FFT, multiply 
the transforms and then inverse transform the result. 


B.9 Commonly Occurring Transforms 


Here we describe some useful common transforms (see figure B.l). 
with normalization. 


We have taken some liberties 


• The FT of a 5-function at the origin /(r) = 5(r) is a constant. If the 5-function is shifted, so 
/(r) = 5(r — r 0 ), the constant is multiplied by exp(*k ■ r). 

• A 1-dimensional 5-function is the derivative of a step function. Hence the transform of a step 
function is proportional to 1/k. 

• The transform of a Gaussian f(r) = exp(—r 2 /2cr 2 ) is another Gaussian f(k ) = exp(— k 2 o 2 /2). 
This generalized the result for a 5-function — which can be thought of as a Gaussian with 
(7 —> 0. Fine-scale structures in real space correspond to extended features in transform space. 

• The transform of a ‘box-car’ or ‘top-hat’ function /(r) = 1 for |r| < L/2 is the ‘sine’ function: 
f(k) = sin(kL/2)/(kL/2). 

• The transform of a ‘comb’ function c& r (r) = Y2 ^( r — nAr) is another comb function: c(k) = 

n 

c 2 tt/a r(k) = Y2 ~ 2irn/Ar). Comb functions are often referred to as ‘Shah’ functions. 

n 
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Figure B.l: A variety of Fourier transform pairs. 


B.10 The Sampling Theorem 


As a useful and interesting application of some of the foregoing results consider the transform of 
a ‘pixellated’ image. Say we have some source image f(r) which we measure at a set of locations 
separated by Ar (see figure B.2|. The result is a set of (5-function samples / pix (r) = CAr x /■ The 
transform of these pixel values is, by the convolution theorem, / p i x (fc) = c 27r /Ar ® /■ This is the sum 
of a set of replicas of the transform of the source image placed at locations k = 27 m/ Ar. 

Now suppose that the original signal is ‘band-limited’, so f(k) ^ 0 only for |fc| < fc max . This 
is not at all an unreasonable assumption; any astronomical telescope produces images which are 
strictly band-limited. If the cut-off frequency is less than one half of the spacing between the of the 
comb c 27r /Ar then the replicas do not overlap. This means that the transform of the source image can 
be recovered exactly from the transform of the pixels, simply by multiplying by a box-car function 
of width 2/c max : 


m = w(k) x / pix (fc) 


(B.25) 
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Figure B.2: Illustration of the sampling theorem. The upper left plot shows a random function 
f(r) and its transform f(k) is shown schematically in the upper right plot. Sampling f(r) at a set 
of uniformly spaced points yields the function / s (r) (a set of (5-functions) whose transform f s (k) 
is the superposition of a set of replicas of the transform of the original field on a grid of spacing 
Afc = 27r/Ar. If the sampling rate is sufficiently high then these replicas do not overlap. It is then 
possible to recover the transform of the original field simply by windowing f s with a box-car W{k). 
Inverse transforming recovers the original field f(r). Equivalently, in real-space, one can recover the 
original field by convolving the sampled field with a ‘sine’ function which is the transform of the box 
car. 

where W(k) = 1 if |fc| < fc max and zero otherwise. 

The value of the source image at position ro is 

/(ro) = / S Kk)e~ ikr ° = J ^ W{k)f p i x (k)e~ ikr ° (B.26) 

but /pix(fc) = X] fp elkrp where f p is the value of the pixel at position r p . Hence the source image 

p 
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value can be recovered exactly from the finite number of samples f. p as 

/O’o) = £ U [ ^ W{k)e ik ^~^ = Y, fpW(r P ~ r 0 ). (B.27) 

p p 

The transform of the box-car is W(r) = sinc(fc max r), so this says we can recover the source image 
value at any position — not necessarily at a measured point be combining the measured values 
with weights equal to sinc(fc max r). This is commonly referred to as sine interpolation. 

The pixel spacing required in order to be able to sinc-interpolate is A r < 7r/fc max . Equivalently, 
with given pixel spacing Ar, the source image should not contain any signal at wavelengths less than 
Amin = 2Ar — i.e. the shortest wavelength allowed just fits within two pixels. If this condition is 
satisfied, the image is said to be critically sampled and can be shifted and re-sampled without any 
loss of information. 

One can show that for a telescope with primary diameter D , observing at wavelength A, critical 
sampling requires that the angular pixel size be less than A0 = A/2 D. For HST (D = 2.4m) at visible 
wavelength of say 0.5/im this requires a pixel size of about 0".02. For the wide field imager WFPC, 
the size of the wide-held pixels are 0 ,7 .1, so the instrument is not critically sampled. However, with 
multiple exposures, it is possible to reconstruct a critically sampled image. 

A remarkable feature of sine interpolation is that if one uses it to interpolate from one image 
grid onto another which is just shifted with respect to the first, then if there is noise in the source 
image which is uncorrelated from pixel to pixel, then the noise in the interpolated image is also 
uncorrelated. 


B.ll Problems 

B.ll.l Fourier Transforms 

Compute the Fourier transforms of the following 1-dimensional functions. Also sketch both real-space 
and Fourier-space functions. 

1. A delta-function: f(x) = 5{x — Xo). 

2. A step function /( x) = 1,0 for x > 0, x < 0 respectively. 

3. A Gaussian: /( x) = exp(— x 2 /2o 2 ). 

4. A ‘double-slit’: f(x) = 5(x — d) + 5(x + d). 

5. A ‘dipole’: f(x) = 8{x — d) — 6(x + d). 

6. A ‘box-car’: f(x) = 1 if |x| < L/ 2, /( x) = 0 otherwise. 

7. A ‘comb-function’: f(x) = S(x — nAx). 
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Appendix C 

The Boltzmann Formula 


Consider a gas of N distinguishable particles, each of which can be in one of a set of discrete energy 
states Ei = iAE. 

The number of different ways to assign these particles to a particular configuration, ie a certain 
set of occupation numbers {n^} for the different energy levels is 


W = 


N\ 


ni\n 2 \ ■ ■ .rij-ilnjnj+i!. 


(C.l) 


In the evolution of such a gas the occupation numbers will change as particles scatter off one 
another, but must obey the constraints 


y, rij = N 


and 


^ ' UjEj — -f^total- 


(C.2) 


If the system explores the available states in a random manner then the probability should be 
proportional to W. Now there is a certain set of occupation numbers n, for which W (and therefore 
also log W) is maximized. We expect that the system will typically be found with occupation 
numbers very similar to h,j. For this most probable configuration n has a specific dependence on 
energy E j which we will now deduce. 

Consider a scattering event where two particles initially in level i end up in levels i — 1, i + 1, 
as illustrated in figure |C.1| This clearly obeys the number and energy conservation. The new 
complexion is 

A 1 

W = —;- 7 -^ 7 ^-7TT7-—-• (C.3) 


ni!n 2 !... (nj_ x + l)!(n; - 2 )!(n i+ i + 1 )!...' 


The ratio of complexions is 


W_ 

W 


i(rii - 1 ) 


(ni-i + l)(n i+ i + 1) rii-irii+i 


(C.4) 


where we have assumed that the occupation numbers are large. 

For the most probable configuration the complexion should be stationary with respect to small 
changes (such as the single scattering event described above), so W'/W = 1 or 


rij -1 

rii 


rii 

n-i+i 


(C.5) 


This is true regardless of the choice of i. Thus, in the most probable configuration, the ratio of the 
occupation of one level to that of the next higher level is a constant 


n(E + AE)/n(E) = constant 
and the functional form which satisfies this is 

n(E) = a exp (—/3E) 


(C. 6 ) 


(C.7) 
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Figure C.l: The circles represent schematically the number of particles in each energy level. A 
possible ‘scattering’ event — i.e. one which respects conservation of both energy and particle number 
- is illustrated. Here two particles initially in energy level Ei end up in the adjacent energy 
levels. For large occcupation numbers, and near equilibrium, the probability for the state should be 
stationary with respect to such exchanges. This requirement leads to the Boltzmann formula. 


where a and (3 are constants, whose values are fixed once we specify the total number of particles 
and the total energy. 

The occupation numbers n(E ) are proportional to the probability to find some arbitrary particle 
with energy E. Thus we obtain the Boltzmann law 


p(E) oc exp(—/3E). 


(C.8) 



Appendix D 

Dispersive Waves 


Many physical systems have equations of motion that admit planar waves 

0(x,i) = 0 k e i( “ kt ^ kx) (D.l) 

as either exact or approximate solutions. In this, and in what follows, the actual wave is the real 
part of this expression. Examples covered here include sound waves, electro-magnetic waves, scalar 
fields, deep ocean waves, waves in a plasma etc. 

A pure plane wave propagates with velocity 

c = uj^k/k 2 (D.2) 


since 


4 >(x,i + t) = = faeUuwt- k-(x—cr)) = 0 ( x _ CTji ) 


(D.3) 


so the wave at time t + r is identical to the wave at time t but shifted by a distance Ax = cr. 


Equation (D.2 1 is the velocity at which surfaces of constant phase ^>(x, t) = uj-^t — k • x march across 
space, and the speed of this motion |c| = uj^/k is called the phase velocity (though it should really 
be called the phase-speed). 

For non-dispersive systems, the phase speed is independent of the wave-number. These systems 
admit, for instance, 1 -dimensional solutions which propagate preserving their wave-form, and also 
the response to a 5-function source is an outgoing spherical wave with a sharp 5-function pulse profile. 
For many of the systems mentioned above, however, the phase speed depends on the wavelength. 
This has the consequence that wave-packets travel at a different velocity from the wave-crests — 
the group velocity — and that the response to a 5-function source becomes a ‘chirp’. 


D.l The Group Velocity 

The dispersion relation is the relation between the spatial and temporal frequencies for a wave, 
usually as an explicit equation for the frequency ui in terms of the wave-number k: 

u> = w(k). (D.4) 

We will often use the notation = w(k). Also, for many systems, the frequency is independent 
of the direction of the wave, and then to = oj(k). Non-dispersive systems have the ‘trivial’ linear 
relation in(k) = ck. 

Consider a disturbance which is the sum of two plane waves with wave-vectors ki and k .2 '. 

<j>(x, t) = e iK‘-b-x) +e iKt-k 2 .x) > ( D . 5 ) 

Now e ia + e ib = e i («+ 6 )/ 2 ( e i ( a - 6 )/ 2 + e -i(<*-&)/2) = 2e^ a+b ^ 2 cos((a - b)/ 2), so we can write this as 
(j){x, t) = 2e i( ^-' t x) x cos ^ kl 2 _ kl ~ k2 . x ) (D. 6 ) 
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Figure D.l: Wave which is the sum of two neighboring frequencies cos(fcia;) + cos(k 2 x). This is the 
product of two sinusoids, one the ‘carrier’ at the mean frequency k = ( k\ + ^ 2)/2 and the other the 
envelope at the modulating frequency (fci — k2)/2. This phenomenon is known as beating. The wave 
consists of packets or ‘sets’. The number of waves per set is ~ k/Ak. If the system is dispersive, the 
modulation pattern travels at a speed u groU p = Auj/A k which differs from the phase velocity. 


where ui and k are the average temporal and spatial frequencies. If the two frequencies are similar, 
so ki — k 2 -C k l5 k 2 , then this is a rapidly oscillating plane wave with e *M- k ' x ) being modulated 
by a relatively slowly varying amplitude 2cos((Aijjf — Ak • x)/2) as illustrated in figure D.l 
Evidently, this modulation pattern travels at velocity 


AwAk 
= JKky 


(D.7) 


For small Ak this tends to a well-defined limit, for which these sets, or groups, of waves travel at 
the group velocity 


du>( k) 

Vg = “dir 


(D. 8 ) 


For a non-dispersive system, the group- and phase-velocities are the same, but in general they 
differ. For example: 


• For deep ocean gravity waves , for instance, the dispersion relation is = yfgk so v g = c/2. 
For these waves, the wave-crests march through the sets, appearing at the tail and vanishing 
at the nose. 

• For free de Broglie waves , with dispersion relation w(k) = hk 2 /2m, the phase-velocity is 
c = Kk/2m group velocity is v g = hk/m = 2c. 

• For electro-magnetic waves in a cold plasma, or for Klein-Gordon waves, with w(k) 2 = c 2 k 2 + 
m 2 c A /h 2 , the phase velocity is v p = Lu/k = cy 1 + m 2 c 2 /h 2 k 2 which diverges as k —> 0, while 
the group velocity is v g = duj/clk = c 2 k/w. 
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D.2 Wave Packets 


This argument can be generalized to describe the propagation of waveforms with more complicated 
profiles. Let us define the real function 

W(x) = W(q)e iq x , (D.9) 

q 


where reality of W (x) requires W(— q) = W*{ q). Now construct a traveling wave 4>{x,t) as the sum 
of modes like (D.l) and with <^k = W(k — k): 


= k-k)e* 


— k-x) — k-x) 




(f2qt—q-: 


(D.10) 


where q = k — k and 


O q = w(k + q) - w(k) = q, 


duj 

dkj 


1 dru 

2 dkidki 


(D.ll) 


with partial derivatives evaluated at k = k. If we keep only the term of first order in q here, the 
wave is 

0(x,i) = W(x - v g f)e i(iJt - E ' x) . (D.12) 

where v g = du/dk is the group velocity. This is again a rapid oscillation at the carrier frequency k 
modulated by the moving envelope function W(x — v g f). 

This therefore describes a wave packet with profile W (x) which propagates along at the group 
velocity, preserving its shape. Now small q means a large packet, since the values of q for which 
W(q) are appreciable are on the order of q ~ 1/L, with L the extent of the packet. Thus we expect 
this non-evolving packet approximation to become better the larger the packet. For small packets, or 
for long times, the packet will evolve. We can estimate the evolution time-scale for the wave-packet 


if we keep the next order term in the Taylor expansion (D.l 1). We then have 


</>(x, t) = W{x — v g t, t)e 


i(a;£—k-x) 


(D.13) 


with 

W(x,t) = E W(q)e i 9 i 9 y ( d 2 w / dfc<dfcj ) 4 / 2 e* q ' x . 

q 

The transform of the wave-envelope in this approximation then becomes 


W(q,f) = W(q)e iqiq ^ d2u/dkidk ^ t/2 , 


(D.14) 


(D.15) 


so the evolution of the packet profile is small for times t ( q 2 d 2 co/dk 2 )~ 1 . The evolution of a small 
wave packet is shown in figure D.2 We can identify three time-scales here; T p h aS e, Across and T evo i, 
being the inverse of the carrier frequency, the time for the packet to pass through its own length, 
and the time for the wave-packet to evolve. These are 


1 

Tphase ^ 

LJ 


L 1 1 

’group qdui/dk evo1 q 2 d 2 Lo/dk 2 


(D.16) 


We have used here L ~ 1 /q. For the deep ocean waves, for example, where uj(k) = y/gk these 
time-scales are in the ratio 1 : k/q : [k/q ) 2 . But k/q ~ N, the number of waves in the packet, so a 
packet containing say 100 waves will take ~ 100 times the wave period to pass through its length, 
but will propagate ~ 100 times its length before substantially changing its shape. For a great many 
naturally occurring dispersion relations T evo i/T CTOSa ~ k/q, so large wave packets (L 1/fc) persist 
for a long time. 
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Figure D.2: Evolution of a wave packet. Initial packet is at top left and is moving to the left. Final 
packet, after propagating once around the (periodic) box is at lower right. The spreading of the 
packet is evident. The dispersion relation is that of deep ocean waves. 
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Figure D.3: Solutions of free-field wave equations have a 4-dimensional Fourier transform which is 
confined to a pair of 3-dimensional hyper-surfaces u> = ±u>k- For a non-dispersive wave these are a 
pair of cones which meet at the origin. Here we show the form of the hyper-surface for dispersion 
relation like that of a massive field: Wk = \Jc 2 k 2 + fa. 


D.3 Evolution of Dispersive Waves 

A common problem is to evolve some system of wave equations forward in time given some initial 
field configuration. In general, one needs to cast the equations into some discretized form and then 
step these equations forward. However, for non-interacting wave equations, life is much simpler since 


the general field is a sum of traveling wave solutions like (D.l). Thus, to evolve from some initial 
field configuration at t = 0 to some later time we need only determine the coefficients fa, and then 
multiply these by the complex phase factor e *“k*/-k x £ 0 synthesize the field fax,tf). One might 
guess that the coefficients fa are just the Fourier transform of the initial field fax, t = 0). However, 
this is not quite correct; consider the waves fa = cos(wf — k • x) and fa = cos (—cot — k • x). These 
waves travel in directions k and k respectively, but have identical transform at t = 0. However, 
these fields have different time derivative at t = 0, so if we supply fax,t = 0) and fax,t = 0) then 
this should suffice to determine both the future evolution. This is not too surprising; these wave-like 
solutions often arise as the solution of second order differential equations for which, to fully specify 
the initial state, we need to specify both the initial displacement 4>(x, 0) and the initial velocity 
fax,0). 

A general field in space time fax) = <^(x, t) has a 4-dimensional Fourier transform 


fak) = 


fa 3 


fax) 


ik-: 


(D.17) 


with inverse transform 


fax) = 


fak 


fak)e 


-ik-x 


(D.18) 


For a free-held, however, the transform vanishes expect on the hyper-surfaces ui = ±c<Jk (see figure 
D.3l. The modes fak) and fa—k) together describe a disturbance propagating in the direction k, 
while fak') and fa—k') propagate in the direction —k. For a real held fax), fa—k) = fa(k), so the 
negative energy modes are fully determined once we specify the positive energy mode amplitudes. 
To specify the held then we need to provide the real and imaginary components of the complex held 
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(j>(u) k,k) at each point in k-space. This corresponds to the need to provide both </>(x,f = 0) and 
0(x, t = 0) at each point in real space. 

Let us write the field as a sum over positive and negative frequency components: 

0(x,t) = J ^</>+(kK (wkt - k ' x) + J ^<T(k)e-^ k ‘- k x) (D.19) 


Taking the spatial Fourier transforms of <f> and (f> at t = 0, we find 

</>(k) = J d 3 x 0(x,O)e ik ' x = </> + (k) + (/>~(— k) 

and 

</>(k) = J d 3 x 0(x,O)e' kx = itUk</> + (k) — (—k). 

Solving for </> + (k) and <p~(k) gives 


(D.20) 


(D.21) 


^+(k) 




m. 

iu> k 


and 


r(k) 


0(-k) 


<M~k) 


For example, consider a 1-dimensional, initially static, disturbance 


(f>{x, y,z,t = 0) = /( x) 
4>{x, y,z,t = 0) = 0. 


Evaluating the transforms and using the above pair of equations gives 

<Mk) = 0-(-k) = 6(k y )5(k z )f(k x )/ 2. 


The future development of the field is 

= \ j + c.c 

For a non-dispersive system (i.e. one for which u >k = ck) this is 


</>(x,f) 


1 

2 


+ c.c = /(l + d) + 


(D.22) 

(D.23) 

(D.24) 

(D.25) 

(D.26) 

(D.27) 


As another example, consider a localized impulse which we model as <^(x, 0) = 5(x), c/>(x, 0) = 0. 
Now we have 0 + (k) = </> _ (k) = constant and the future development of the wave is 


0(x, t) oc 


d 3 k eh^t-k x) 


d 3 k e“ i(a,kt “ k ' x ) 


(D.28) 


For given x, t the major contribution to say the first integral here comes from the region of k- 
space where the phase ip( k; x, t) = tu^t — k • x is stationary, i.e. where di/j/dk = 0. Performing the 
derivative, we see that the phase is stationary for wave number k = ko(x, t) such that v g (ko) = x/f. 
In the vicinity of this stationary point, we can expand the phase as 


V>(k ;x,t) 




ko • x + -q.qj 


d 2 uj k 
dkidkj 


(D.29) 
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where q = k—k 0 and the partial derivatives are to be evaluated at k = k 0 (x, t). In this approximation 
the wave solution is 


0 (x, t) ~ e *( w k 0 i- k o-x) 


d 3 qex p 



d 2 0J k \ 

dkidkj J 


(D.30) 


with a similar contribution from the second integral. The integral here a slowly varying envelope 
which modulates the relatively rapidly varying complex exponential factor. At fixed position x, 
the wave has a sinusoidal variation with time varying frequency w(ko(x, f)) and with time varying 
amplitude. This is called a ‘chirp’. 

Specializing to the case of an isotropic dispersion relation u>(k) = co(k), the quadratic form 
appearing here is 


ePojk 1 dio 2 d 2 uj 2 
qiq ° dkidkj = kdk q± + dk 2 ^ 


(D.31) 


where q|| = (k ■ q)k and q^ = q — q . The value of the integral is then ~ k(dui/dk)~ 1 (d 2 u}/dk 2 )~ 1 ^ 2 , 
and similarly in two dimensions it is ~ yjk/(doj / dk){d 2 uj / dk 2 ). As an application, consider deep 
ocean waves for which ui = \fgk. The group velocity is dui/dk = \\fgjk and the stationary phase 
condition v g (k 0 ) = x/f gives k 0 (x, f) = gt 2 x/Ax 3 and the frequency is u>o(x,t) = w(k 0 ) = gt/2x. 
The determinant is \d 2 u>/dkidkj] ~ g/k q. The envelope is 


J d 2 q exp dkVkk *) ^ k °( x ' *) 3/2f 1 (D.32) 

so the wave develops as 

$(x,t) ~ k^t-'e™ 0 *-* 0 * ~ ^3 cos(gt 2 /4x). (D.33) 


Had we instead applied an initial impulsive velocity </>(x,t = 0) = <5(x) with zero displacement 
</>(*., t = 0 ) = 0 then this generates a spectrum \(/> + {k)\ 2 oc 1 /w 2 oc 1/k rather than a flat spectrum. 
Similarly, if the initial impulse is not point-like but is extended over some region of size l then this 
will give a cut-off in the power spectrum at k 1/1. For example, for a Gaussian velocity impulse 
with profile </>(x, 0 ) oc exp(— x 2 /2<r 2 ) the wave is 

+i 

</>(x, i)~ —^ exp(— g 2 t 4 a 2 /8x 4 ) cos(gt 2 /4a;). (D.34) 

x z 

This is plotted in figure |D.4| The 2-dimensional disturbance generated from a localised random 
disturbance is shown in figure |D.5| 
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Figure D.4: Evolution of a spherical deep ocean water wave. The quantity plotted is <j> = 
(t 2 /x 2 ) cos(t 2 /4cc) exp(— t 4 /8x 4 ), for t = 15,30,45,60. This is the ‘chirp’ generated by an impulse 
with a Gaussian profile with scale a = 1. 
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Figure D.5: Evolution of deep ocean water waves from a localised ‘storm’. The initial conditions 
were uniform aside from a small region near the origin in which the field was given a random ‘white 
noise’ displacement. This was then evolved as described in the text. 
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Appendix E 

Relativistic Covariance of 
Electromagnetism 


Here we review the relativistic formulation of the laws of electromagnetism. 

The electronic charge e is a constant and is the same in all frames of reference. This means 
that the charge density p = en e transforms so that pfE is conserved, ie p transforms like the time- 
component of a four-vector. Similarly, the current density j transforms like the spatial components 
of a 4-vector so we can combine p and j to form the four-current 

3=j IJ ‘ = {cp, j). (E.l) 


Some features of the transformation of the four-current are quite familiar. For example, it is clear 
that a static charged rod, if put into motion becomes a current. Less familiar perhaps is the fact 
that an electrically neutral rod, but containing a current in the form of electrons moving against a 
static background of ions will, in a boosted frame, have a non-zero charge density. This has to do 
with the fact that the density of a set of particles depends on the frame of reference. 

The equation of charge conservation is 

g + v-j = 0 4=^ j% = o (E.2) 

which is covariant (it says that the scalar quantity j 1 ' the four-divergence of the four-current, 
vanishes). 

Maxwell’s equations in terms of the potential can also be expressed in a covariant manner: 


□A = —47rj/c 1 
d(j) = —47 Tp J 





(E.3) 


with 

A = = (<j). A) (E.4) 

another four-vector. 

The Lorentz gauge condition can also be expressed in a covariant manner: 

V ■ A + cj)/c = 0 <=> = 0. (E.5) 


It turns out that the fields E and B are the non-vanishing parts of the Faraday tensor 

t A A • a 1 


which 


p — 


four- 

potential: F IJV 

= A[ 

0 

-E x 

Ey 

-E z 

E x 

0 

B z 

— By 

Ey 

-B z 

0 

B x 

E z 

By 

-B x 

0 


= — A Vt/J/ with components 


(E.6) 
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The transformation law for the E, B is 


Ej|=E|, 

E' ± = 7 (E_l + f3x B) 


Bj| = B || 

B' x = 7 (B 1 -/3xE). 


see R+L for simple physical examples illustrating these transformation laws. 
The relativistically correct expression of the Lorentz force law is 

^ ^ ^ 

dr me 

whose time and space components can be written as 

^ = ^ = qE-v 

£ = *£ = ®[E + (vxB)/c] 


(E.7) 


(E. 8 ) 


(E.9) 


which is identical to the non-relativistic form, except that the momentum here is the relativistic 
momentum P = ymv. 


E.l EM Field of a Rapidly Moving Charge 


In the rest frame of a charge (primed frame) the electric potential is 4 V = q/r' and the magnetic 
potential A' vanishes, so applying a boost along the ar-axis gives the potential in the lab-frame 
(unprimed frame) 

7 Q/r' 

A^ = K^ t .A' v = 


-7 Pq/r 1 
0 
0 


so 


with 


A = 


(t> ’ 


7 q/r’ 

A 


—/?7 xq/r' 


r' = \J x ' 2 + y ' 2 + z ' 2 = \/^ 2 (x — vt ) 2 + y 2 + z 2 
Computing the electric field E = — A/c — yields 


E=^| 

r ' 3 


x — vt 

y 


(E.10) 

(E.ll) 

(E.12) 

(E.13) 


For a field point at x = z = 0, y = b, we have E z = 0 and 

, , . 'yqvt 


E y (t) = 


(7 2 v 2 t 2 + b 2 ) 3 / 2 


and E x (t) = 


(7 2 v 2 t 2 + 6 2 ) 3 / 2 


(E.14) 


If we define the dimensionless time y = 'yvt/b then we can write 


b 2 Ey{t) = 1 

7 q (1 + y 2 ) 3 / 2 


and 


b 2 E x (t ) 


7 q 


1 y 

7 (1 + y 2 ) 3 / 2 


(E.15) 


These functions are shown in figure [eTT| Both of these are impulses of extent Ay ~ 1 —► Af ~ b/'yv. 
For highly relativistic motion this means that the electric field configuration is not spherical (in 
which case the period of the impulse would be At ~ b/v) but is compressed along the direction of 
motion by a factor 7 . Also, for highly relativistic motion E x <C E y . The y-component has maximum 
value E max = 7 q/b 2 . 
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Appendix F 

Complex Analysis 


F.l Complex Numbers and Functions 

• An example of a complex number is z = x + iy , where x , y are real and i = y/—l. 

• A complex number corresponds to a point (x, y) on the Argand Diagram. 

• The complex number 2 can also be represented in ‘polar coordinates’ as z = ae tlp = a(cos(ip) + 
isin(ip)), with a = \Jx 1 + y 2 and tan(^j) = y/x. 

• Complex numbers can be added like 2-vectors, but unlike ordinary 2-vectors can also be directly 
multiplied, zz' = aa'e^+v), an d divided z/z' = 

• Multiplying a complex number z' by 2 = ae lv increases its length by a factor a and rotates it 
in the Argand plane by an angle ip. 

• A complex function f(z) associates with each point 2 a complex number f(z). 


F.2 Analytic Functions 


A complex function f(z) can be written as 

f(z) = u(x,y) + iv(x, y) 


(F.l) 


where u(x, y) and v(x, y) are real. 

A very special, but nonetheless extremely important, class of complex functions are analytic 
complex functions defined as those for which 


du/dx = dv/dy 
du/dy = —dv/dx. 


These are called the Cauchy-Riemann conditions. 
If we use the notation 


/ = it + iv = 


u 

v 


the differential of an arbitrary complex function f(z) = u + iv is 



Au 


du/dx 

du/dy 

Ax 

Av 


dv/dx 

dv/dy 

. A y. 


(F.3) 


(F.4) 


Now if f(z) is analytic we can use the Cauchy-Riemann conditions to 
give 


A/ 


A u 
Av 


du/dx du/dy 
—du/dy du/dx 


Ax 

Ay 


eliminate dv/dx, dv/dy to 


(F.5) 
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Figure F.l: If an analytic function /(z) = u{x,y) + iv(x,y) is known at all points along the x-axis 
then the value at some point (x, Ay) is given in terms of the values of u and v at the neighboring 
points on the x-axis. 


This is rather interesting; the matrix appearing here looks a lot like a rotation matrix: R = 
{{cosf?, — sinf?}, {sin0,cos0}}. Indeed, if we define o' = y/(du/dx ) 2 + ( du/dy ) 2 and tan(yj') = 
{du/dy)/{du/dx) the differential of / becomes 



A u 

— n' 

COS (tp') 

— sin(yj') 

Ax 

Av 

— CL 

sin(^> / ) 

cos(y/) 

Ay 


But this is just a multiplication of Az = Ax + iAy by a factor a' and a rotation through an angle 
tp', i.e. it is a multiplication of Az by another complex number f = a!e lLp : 


A / = f'Az 


Thus an analytic function /(z) has derivative 


/'(*) 


lim M- 
A 2—>0 Az 


(F.7) 


(F.8) 


which is also a complex function. 


F.3 Analytic Continuation 

A general complex function /(z) = u(x,y) + iv(x,y) is clearly a two dimensional complex function; 
we need to specify the values of the real and imaginary parts at all points on the plane. Analytic 
functions have the magical property that they are essentially one dimensional, in the sense that if the 
values of u , v are known along some line in the Argand diagram then they are known everywhere. 

To see how this comes about, let’s discretize the Argand plane with a grid spacing Ax = Ay, as 
illustrated in figure |F.1| We will assume that the grid spacing is very small compared to the scale 
of any variations in u or v. Let’s also assume that u and v are known at all points along the x-axis. 
The values of u and v at some point z = (x, Ay) (i.e a distance Ay above the point z = (x, 0)) are 


u(x, Ay) = u(x, 0) + du/dy Ay = u(x, 0) — dv/dxAy 
v(x, Ay) = v(x, 0) + dv/dyAy = v(x, 0) + du/dxAy 


where we have used the Cauchy-Riemann conditions (F.2|. 
u(x, 0))/Ax etc. and Ax = Ay these are 


But with du/dx = (u(x + Ax,0) — 


u(x, Ay) = u(x, 0) — v(x + Ax, 0) + v(x, 0) 
v(x, Ay) = v(x, 0) + u(x + Ax, 0) + u(x, 0) 


(F.10) 
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Since all of the values on the right hand side are known for all x, this determines the values of f(z) 
for all x along the line y = Ay. These in turn determine f(z) along the line y = 2A y and so on. 
If we let Ax, Ay tend to zero, this becomes exact, and we can thereby analytically continue the 
function f(z) off the real axis, or indeed off any line across the Argand plane. 

The foregoing is valid provided f(z) is analytic, but will cease to apply in regions where f(z) does 
not obey the Cauchy-Riemann conditions (F.2 I. As we shall see, there are complex functions which 


are analytic except at one or more points in the Argand plane where the function f(z) becomes 
infinite. These points are known as the poles of the function f(z). 


F.4 Contour Integration 

The great utility of complex analysis for astrophysicists is that it allows us to evaluate, or provide 
analytic formulae for, integrals occurring in physical problems. The essential result used here is that 
the value of a contour integral, I = j> dz f(z) taken around some closed loop in the Argand plane is 
invariant under deformation of the integration path (provided the path is not forced to cross over a 
pole). 

We can see how this applies in the foregoing example. Let I{yo) denote the integral of / along 
the line y = yo in the Argand plane, so for this path dz = dx. In the discretized complex plane, for 
yo = 0 this is 

(F.ll) 


7(0) =/*/(*) = E M«(*. ») + »(*, 0)) 


but the integral I (Ay) is identical since, with u(x, Ay), v(x, Ay) given by (F.101 the terms involving 
v in the u integration cancel and vice versa. The value of I(yo) is therefore independent of which 
contour we use. 

More generally, in the vicinity of the origin we have 


m - m 

so with dz = dx + idy we have 


du 8u 
dx X + dy 


y- 


dv 

i-x- 

dx 


dv 

irr-y, 

dy 


, . i du du .dv .dv , , . , 

f(z)dz =[ m x+ Wy y + ^ + i-y ) (dx + idy). 


(F.12) 


(F.13) 


The partial derivatives are to be considered constant here. Now when we perform the integration 
§ dz f(z) around a closed loop, terms like (du/dx) j> dx x vanish, since xdx = dx 2 / 2 is a total 
derivative, so j> dx 2 = 0. The only terms which can contribute are terms like (du/dy) dx y, which, 
in general, would give a contribution proportional to the area within the contour. However, for an 


analytic function the Cauchy-Riemann conditions (F.21 allow us to replace dv/dy with du/dx and 
dv/dx with —du/dy, and we have 


dz f(z) = 


du du 
dy dx 


dx y + (f> dy x I = 


du du 
dy dx 


j>d(xy)= 0. (F.14) 


However, the choice of origin was arbitrary, so we find that, quite generally, dz f(z) vanishes for 
any small loop (provided f(z) is analytic on and within the loop). 

If we have some large loop then we can slightly deform it to obtain the integral around a new 
large loop plus the integral around a small closed loop, this being the difference between the two 
large loops, and which vanishes. Repeating this process allows us to continuously deform a contour 
integral without changing its value. If a function is analytic everywhere within some loop, for 
example, then the loop can be shrunk to zero and it follows that the integral around the original 
loop vanishes. 

However, what if the loop contains a pole? For example, consider j> dz / with f(z) = f'(z) / (z—Z q) 
where f(z) is analytic and where the path encircles the point zo as illustrated in figure F.2 The 
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Figure F.2: Example of a contour integral. 


function f(z) has a singularity at z = Zq, but can be shown to be analytic elsewhere. This means 
that we can shrink the integral down to a small circular loop of radius e enclosing the pole. With 
z — Zo = ee 1<p , so dz = ee l,p dip, the integral then becomes 

dz f = d “ ^ ~ f(z 0 ) j ^ = f'(zo) j d(p = 2irf'(z 0 ) (F.15) 

where the second and third loop integrals are taken around the small circle. This approximation 
becomes exact in the limit e —> 0. We say that the integral becomes the residue of the integrand at 
the pole. 
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